databricks / iceberg-kafka-connect

Apache License 2.0
220 stars 49 forks source link

Improve initial connection timeouts with Amazon MSK clusters #275

Open antcalvente opened 4 months ago

antcalvente commented 4 months ago

This PR solves the issue of the control topics not being able to join successfully on a cloud environment as the current timeout is set to 1 second and it's not enough.

Before this PR: cg-control-XXX consumers are not always joining on time and the connector fails After this PR: new timeout allows consumers join the consumer group and sink connectors work without issues

How to reproduce the issue?

  1. Create an AWS MSK cluster with 1 topic (a local cluster is not valid as it's local network and it will surely work)
  2. Install kafka-ui or any tool to check consumers of the topic
  3. Create an Iceberg connector through MSK Connect
  4. Check logs in Cloudwatch or the enabled logs tool when creating the connector
  5. Post a message after initialisation
  6. Check if the message is consumed (logs) and saved in your target system

Note: The connector will be marked as Running on MSK even if it couldn't start properly with the result of not consuming messages.