Open logorrheic opened 5 months ago
I wonder if this can be solved with https://github.com/tabular-io/iceberg-kafka-connect/pull/233 , as it's a failure handler at the end of the day, with a pluggable classes for dispatching back to the connector after an exception happens. The users are meant to write their own + catch the exceptions that matter to them.
Also you can't retry continuously, Kafka connect will consider it a zombie at some point.
Best practice is to die and have something continuously restart your kafka connect task.
Hello @logorrheic ,
I resolved the similar issue with applying adaptive
retry mode on AWS sdk! I guess it is also helpful to your case
AWS_RETRY_MODE : adaptive # DEFAULT: legacy
AWS_MAX_ATTEMPTS : 10 # DEFAULT: 3
This is a similar report to https://github.com/tabular-io/iceberg-kafka-connect/issues/231: we see these task failures frequently, often daily or weekly. Rather than request more configuration parameters I'd like to question whether these responses really are unrecoverable. If the task could continuously retry (with some back-off period) that would suit us better.
Here's a recent stack:
We're running connector 0.6.18 on Kafka 3.7.0.