apache / iceberg-python

Apache PyIceberg
https://py.iceberg.apache.org/
Apache License 2.0
458 stars 165 forks source link

Adaptive retry for Glue Catalog calls to fix Glue throttling #1294

Open mark-major opened 16 hours ago

mark-major commented 16 hours ago

Feature Request / Improvement

I have experienced throttling exceptions when multiple nodes are reading and writing an Iceberg table with a Glue catalog. I have wrapped my library calls in a retry wrapper, but this functionality should be implemented in the library instead.

I would add an option to the Glue catalog constructor to define the maximum number of retries and set the boto retry policy to standard instead of the default legacy. Standard is adding jitter to the retry interval which would ensure independent processes won't block each other (Thundering herd problem)

Fokko commented 15 hours ago

Hey @mark-major thanks for jumping in here. I see that we only added retries to the REST catalog so far. Having this for the Glue catalog would be a great addition 👍

mark-major commented 15 hours ago

I'm happy to work on this if I have some time.

Fokko commented 15 hours ago

That would be great, happy to review! 🙌

kevinjqliu commented 2 hours ago

we use boto3 to create the glue client

looks like it has some built-in retry mechanism https://boto3.amazonaws.com/v1/documentation/api/latest/guide/retries.html#configuring-a-retry-mode