databricks / iceberg-kafka-connect

Apache License 2.0
213 stars 47 forks source link

Ensure exactly-once on connector task(w/ coordinator) rebalancing, as like Apache version #280

Open okayhooni opened 2 months ago

okayhooni commented 2 months ago

re-opening PR #279 with additional defense logic authored by @bryanck

Context

I found that duplicated records occurred on the CDC sink with this Iceberg sink connector after using spot nodes and activating the node consolidation feature of Karpenter. Although it happens very rarely, when it does occur, it tends to happen consecutively. In a related issue inquiry, @bryanck informed me that in the Iceberg version of the connector, safeguard logic has been added to ensure that no more than one coordinator task is running simultaneously during the connector rebalancing process.

Commit Contents

Related Links

cc/ @fqtab

okayhooni commented 2 months ago

@fqaiser94 ( @fqtab )

Could you review this commit and related discussions on the #kafka-connect channel of apache-iceberg Slack community..?