databricks / iceberg-kafka-connect

Apache License 2.0
218 stars 47 forks source link

How about adding an option to use adaptive commit interval depending on cumulative records count? #155

Open okayhooni opened 11 months ago

okayhooni commented 11 months ago

Currently, there is only iceberg.control.commit.interval-ms option with fixed commit interval on Iceberg table.

Our web logs ingested by this sink connector have log volume varying with time. so we cannot use consumer lag metric of this sink connector to monitor the overload issue on the sink connector itself. (please, see below chart)

image

If there is some option like iceberg.control.commit.max-records-count, it can be used to adjust fixed commit interval of Iceberg table and to alleviate consumer lag fluctuation..!

If iceberg.control.commit.max-records-count option value is provided, then iceberg.control.commit.interval-ms option value also can be used as maximum commit interval that is a fallback when the threshold on records count is not reached.

bryanck commented 11 months ago

I think this is a reasonable feature to add, though it will probably have to wait until after the sink is merged into the Iceberg project. You can keep this open so we can track it.

ArkaSarkar19 commented 5 months ago

Hi, I need a similar use-case, where the commit happens based on certain number of records / sizes. Is this being considered to be added as a feature ?