We are noticing duplicates to occur within the same partition when we configure the primary keys with upset-mode enabled on the table. We are tried the two setups below:
Reading CDC data with cdc-field and mode enabled in iceberg connector.
Reading Data form Kafka with a primary key.
In both the setups the issue occurs.
Ideally we should see only one record per primary key in a partition, however the results are inconsistent there are a few percentage of records which are duplicated within the same partition. We need some support on why this might be occurring, we suspect it could be due to some concurrency issue in the commit coordinator.
I am attaching the connector config below :
Hi Team
We are noticing duplicates to occur within the same partition when we configure the primary keys with upset-mode enabled on the table. We are tried the two setups below:
In both the setups the issue occurs. Ideally we should see only one record per primary key in a partition, however the results are inconsistent there are a few percentage of records which are duplicated within the same partition. We need some support on why this might be occurring, we suspect it could be due to some concurrency issue in the commit coordinator. I am attaching the connector config below :