awslabs / athena-glue-service-logs

Glue scripts for converting AWS Service Logs for use in Athena
Apache License 2.0
142 stars 46 forks source link

Fix partition pagination for GroupedDatePartitioner #17

Closed dacort closed 5 years ago

dacort commented 5 years ago

For data sources with a large (>1000) number of partitions and using the GroupedDatePartitioner (ALB, CloudTrail, and VPC logs), we were not fetching the full set of partitions from Glue. Due to logic for adding new partitions using that existing list to seed the region names, we were missing some new partitions.

This fix implements pagination from the Glue API so we get the full set of partitions. Related, this is more of a reason to investigate #15 so we don't have to worry about partition detection. :)

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.