Closed OperationalFallacy closed 2 years ago
Hey guys any updates on the matter?
We apologize for delay.
As you can see in the below document, you can configure dynamodb.output.retry
to make more retries in case of throttling. You can increase it to the higher value to avoid job failure due to write throttling.
https://docs.aws.amazon.com/glue/latest/dg/aws-glue-programming-etl-connect.html#aws-glue-programming-etl-connect-dynamodb
I'm trying to copy ddb table using the read and write sink. This is a small test table with 20 mil items and 500MB in size
Avg item size is 28 bytes.
Read sink works just fine. Job reads items pretty fast with 3-5 workers and it takes a few minutes.
However, the write sink is a disaster. The requests gets throttled and it barely can write 100k records a minute. I tried already all Glue versions.
It actually fails fast with retries exhausted. I had to change "dynamodb.output.retry" to 30-50 because default 10 just fails glue job as soon as it starts writing with:
An error occurred while calling o70.pyWriteDynamicFrame. DynamoDB write exceeds max retry 10
This is the sink in python code
Where could be a problem?! This is what metrics on table look like when glue is trying to write.
The table is pretty simple, records of currencies rates like this
Thanks!