Closed okayhooni closed 1 year ago
Full disclosure, I haven't tested this, but it should work if the consumer group ID is unique for each sink. You can either give each connector a different name, or set iceberg.control.group-id
to ensure this. You could also use a separate control topic for each. One issue I can see is that the kafka.connect.vtts
snapshot property value will not be guaranteed to always increase.
I didn't quite understand the rationale for doing this however, when you say different converter do you mean SMT?
I didn't quite understand the rationale for doing this however, when you say different converter do you mean SMT?
no SMT, I mean different converters like those below.
I see, thanks. If you try it let us know how it goes.
Sure! thanks for quick answer!
By the way.. Could you give some advice to run rewrite_data_files()
more FAST on Spark to compact small files in Iceberg table by streaming?
That procedure was too slow, and I found that it processed each partition path sequentially on only one spark task, NOT parallel..! (So, on the Spark plan, it always use only just 1 Spark task!)
I searched related issues on iceberg GtiHub, but I couldn't find any useful clue for solving it..
You might try posting this in the Iceberg Slack general
channel, people there are very helpful.
(There is also a kafka-connect
channel there if interested)
Thanks! I will join!
If the different converter is needed to ingest same target table, it cannot be handled multiple topics parameter on single connector. so, multiple connectors has to be needed.
Is it safe to deploy multiple iceberg sink connectors simultaneously to same target table, with respect to optimistic concurrency on iceberg table?
(Thank you always for your kind reply)