databricks / iceberg-kafka-connect

Apache License 2.0
213 stars 47 forks source link

Ensure exactly-once on connector task(w/ coordinator) rebalancing #279

Closed okayhooni closed 2 months ago

okayhooni commented 2 months ago

Context

I found that duplicated records occurred on the CDC sink with this Iceberg sink connector after using spot nodes and activating the node consolidation feature of Karpenter. Although it happens very rarely, when it does occur, it tends to happen consecutively. In a related issue inquiry, @bryanck informed me that in the Iceberg version of the connector, safeguard logic has been added to ensure that no more than one coordinator task is running simultaneously during the connector rebalancing process.

Commit Contents

Related Links

cc/ @fqtab