aws-samples / bedrock-claude-chat

AWS-native chatbot using Bedrock + Claude (+Mistral)
MIT No Attribution
828 stars 305 forks source link

[Feature Request]I want to speed up analysis with Athena. #223

Closed Taikono-Himazin closed 5 months ago

Taikono-Himazin commented 6 months ago

Describe the solution you'd like

Analysis using Athena has been introduced, but even a simple analysis like the one below takes more than 10 minutes. select * from ddb_export order by metadata.writetimestampmicros.n DESC limit 10; I want to shorten this to a few seconds.

Why the solution needed

As it stands now, it is too slow to be practical.

Additional context

This seems to be because there are a large number of small files on the S3 bucket being analyzed. When I tried counting, there were 7700 files with a total of 2.2MB. If this continues, the request fee to S3 will increase. If you refer to the site below, it may be better to combine the files. https://aws.amazon.com/jp/blogs/news/top-10-performance-tuning-tips-for-amazon-athena

Please let me know if I'm using something wrong.

Implementation feasibility

Are you willing to discuss the solution with us, decide on the approach, and assist with the implementation?

statefb commented 6 months ago

select * from ddb_export order by metadata.writetimestampmicros.n DESC limit 10;

Colud you add WHERE statement like d.datehour BETWEEN '<yyyy/mm/dd/hh>' AND '<yyyy/mm/dd/hh>' to limit partition range?

Example query can be refered as:
https://github.com/aws-samples/bedrock-claude-chat/blob/main/docs/ADMINISTRATOR.md#query-per-bot-id