I am implementing a CDC pipeline from Oracle which has tables not having explicit primary keys. We are specifying the id columns in sink connector based data awareness(no constraint though) and the sink connector is able to work fine.
However, my concern is that the lack of primary key on source means null keys in Kafka and that the mutations on a source record (multiple Updates) are not guaranteed an ordering in Kafka. (Kafka producer behaviour)
Then if we set task.max>1 in sink connector properties, the Updates on the same records may be processed by different tasks(workers) and in a different order.
Can there be a possibility that this results in an inconsistent behaviour during commit, like update ordering getting changed, due to coordinator committing second update in first batch and first update in subsequent commit?
Hi,
I am implementing a CDC pipeline from Oracle which has tables not having explicit primary keys. We are specifying the id columns in sink connector based data awareness(no constraint though) and the sink connector is able to work fine.
However, my concern is that the lack of primary key on source means null keys in Kafka and that the mutations on a source record (multiple Updates) are not guaranteed an ordering in Kafka. (Kafka producer behaviour) Then if we set task.max>1 in sink connector properties, the Updates on the same records may be processed by different tasks(workers) and in a different order.
Can there be a possibility that this results in an inconsistent behaviour during commit, like update ordering getting changed, due to coordinator committing second update in first batch and first update in subsequent commit?
cc @bryanck
Thanks in Advance.