Closed Taikono-Himazin closed 5 months ago
select * from ddb_export order by metadata.writetimestampmicros.n DESC limit 10;
Colud you add WHERE statement like d.datehour BETWEEN '<yyyy/mm/dd/hh>' AND '<yyyy/mm/dd/hh>'
to limit partition range?
Example query can be refered as:
https://github.com/aws-samples/bedrock-claude-chat/blob/main/docs/ADMINISTRATOR.md#query-per-bot-id
Describe the solution you'd like
Analysis using Athena has been introduced, but even a simple analysis like the one below takes more than 10 minutes. select * from ddb_export order by metadata.writetimestampmicros.n DESC limit 10; I want to shorten this to a few seconds.
Why the solution needed
As it stands now, it is too slow to be practical.
Additional context
This seems to be because there are a large number of small files on the S3 bucket being analyzed. When I tried counting, there were 7700 files with a total of 2.2MB. If this continues, the request fee to S3 will increase. If you refer to the site below, it may be better to combine the files. https://aws.amazon.com/jp/blogs/news/top-10-performance-tuning-tips-for-amazon-athena
Please let me know if I'm using something wrong.
Implementation feasibility
Are you willing to discuss the solution with us, decide on the approach, and assist with the implementation?