apache / iceberg

Apache Iceberg
https://iceberg.apache.org/
Apache License 2.0
6.23k stars 2.17k forks source link

Kafka Connect: Add delta writer support #10842

Open bryanck opened 2 months ago

bryanck commented 2 months ago

Feature Request / Improvement

The initial Kafka Connect sink submission did not include delta writer support that the Tabular version has, as there are performance concerns over relying on equality deletes. We should address those concerns and add back the delta writer support, for features like change data capture and upsert mode.

Query engine

None

Willingness to contribute

ajantha-bhat commented 2 months ago

Can we discuss more openly here about the "as there are performance concerns over relying on equality deletes."?

The performance concern is around writing of equality delete from the Kafka connect sink or the reading the equality deletes from the engine side?

bryanck commented 2 months ago

The concerns are over relying on equality deletes to apply the deltas, and read performance of equality deletes in general from the engines.

ajantha-bhat commented 1 month ago

The concerns are over relying on equality deletes to apply the deltas, and read performance of equality deletes in general from the engines.

This should not be a blocker for having equality deletes written from Kafka connect. If Flink writes it, the same performance concern exist for other engines. So, since equality deletes are already used in production, we should not block it. But what we can do is have a separate ticket to improve equality delete read performance and analyze it. Happy to contribute on that side.

CDC, Tombstone handler, Upsert are is a very important feature for users. Just having append only kafka connect sink may not be very much useful in production. So, I am in favour of brining whatever is there in Tabular repo and improving it on top of it. @danielcweeks, @fqaiser94, @bryanck, @stevenzwu : Thoughts?

ismailsimsek commented 1 month ago

+1 adding this feature. maybe it makes sense not to enable it as a default mode,(using append mode as default) so end user activates it when needed. also happy to help.