Is it okay to use multiple iceberg connectors simultaneously to same target table?

databricks / iceberg-kafka-connect

Apache License 2.0

219 stars 49 forks source link

Is it okay to use multiple iceberg connectors simultaneously to same target table? #151

Closed okayhooni closed 1 year ago

okayhooni commented 1 year ago

If the different converter is needed to ingest same target table, it cannot be handled multiple topics parameter on single connector. so, multiple connectors has to be needed.

Is it safe to deploy multiple iceberg sink connectors simultaneously to same target table, with respect to optimistic concurrency on iceberg table?

(Thank you always for your kind reply)

bryanck commented 1 year ago

Full disclosure, I haven't tested this, but it should work if the consumer group ID is unique for each sink. You can either give each connector a different name, or set iceberg.control.group-id to ensure this. You could also use a separate control topic for each. One issue I can see is that the kafka.connect.vtts snapshot property value will not be guaranteed to always increase.

bryanck commented 1 year ago

I didn't quite understand the rationale for doing this however, when you say different converter do you mean SMT?

okayhooni commented 1 year ago

I didn't quite understand the rationale for doing this however, when you say different converter do you mean SMT?

no SMT, I mean different converters like those below.

JsonSchemaConverter (for JSON with schema)
JsonConverter (for schema-less JSON)

bryanck commented 1 year ago

I see, thanks. If you try it let us know how it goes.

okayhooni commented 1 year ago

Sure! thanks for quick answer!

By the way.. Could you give some advice to run rewrite_data_files() more FAST on Spark to compact small files in Iceberg table by streaming?

That procedure was too slow, and I found that it processed each partition path sequentially on only one spark task, NOT parallel..! (So, on the Spark plan, it always use only just 1 Spark task!)

I searched related issues on iceberg GtiHub, but I couldn't find any useful clue for solving it..

bryanck commented 1 year ago

You might try posting this in the Iceberg Slack general channel, people there are very helpful.

bryanck commented 1 year ago

(There is also a kafka-connect channel there if interested)

okayhooni commented 1 year ago

Thanks! I will join!