databricks / iceberg-kafka-connect

Apache License 2.0
220 stars 49 forks source link

Could I configure RETRY behavior of AWS sdk on S3FileIO for the specific HOT connector? #231

Closed okayhooni closed 5 months ago

okayhooni commented 7 months ago

After I deployed this sink connector on our most heavy traffic application log topic, I noticed that in the every morning, when people in our country wake up and start their activities, a couple of the tasks of this HOT connector become FAILED with getting S3 503 error having "Please reduce your request rate." message.

Although I already enabled native object store file layout option of Iceberg for all the sink tables, this issue persists almost every day on only that HOT log sink out of all the sinks I deployed.

When I restart failed tasks with this 503 error manually, then all the tasks become healthy with no issue.

As you know, it is related to the adaptive scaling of S3, based on the request patterns. But there is no way to train the daily log influx patterns.

So, I found the documentation about retry behavior of AWS sdk like below.

I guess this issue can be alleviated with configuring max_attempts and retry_mode.

(example of full stack trace)

okayhooni commented 5 months ago

this issue was resolved with applying adaptive retry mode on AWS sdk, as I mentioned above..!

AWS_RETRY_MODE : adaptive # DEFAULT: legacy
AWS_MAX_ATTEMPTS : 10 # DEFAULT: 3