databricks / iceberg-kafka-connect

Apache License 2.0
220 stars 49 forks source link

Upgrade to iceberg 1.5.2 #235

Closed tabmatfournier closed 6 months ago

tabmatfournier commented 7 months ago

Updates connector to Iceberg 1.5

The iceberg-kafka-connect-events module has been deprecated in this repo: consider these legacy Events. Instead we rely on iceberg-kafka-connect-events which has been moved into Iceberg Core. This will make it easier to move the remainder of the repository over to there this PR updates worker/coordinator/etc. to work off the (significantly) reworked event classes.

The avro payloads produced by the worker/coordinator have changed in a way that is not backward compatible. This is unavoidable atm due to porting the codebase (partially) to Iceberg-core and the changes made there (see: https://github.com/apache/iceberg/pull/8701#discussion_r1348114157).

Encoding relies heavily on the Iceberg avro utils due to having to encode Datafiles, delete files, etc. Unfortunately changes were made upstream that break the format. This puts us in an awkward spot: we either have to port all the machinery of Avro from Iceberg to maintain the same format, or do the breaking change.

This PR ports the marchinery from Iceberg 1.4.x to have a fallback decoder if decoding fails in the event a legacy record was left behind in the control topic during the upgrade. This code should be removed in future releases.

There is no breaking change for the users because of this. Eventing now depends on donated connector code in iceberg-core, which should result in non-breaking changes for our users to migrate when the rest of the code is ported over.

tabmatfournier commented 6 months ago

Should have a branch up shortly that deals with the breaking change instead. Much better for the users --no drain mode needed.

sullis commented 6 months ago

Iceberg 1.5.2 is the latest release https://iceberg.apache.org/releases/