Closed paul-chrlt closed 1 year ago
Hello. We still have this issue. Have you had the chance to look at it yet ? We are stuck on salesforce 1.0.2. connector as all following connector versions create duplicates on identifiers for one of our tables (tested on 2.0.1, 2.0.5 and 2.0.6 recently). Many thanks for your help.
Ran into the same issue that full_refresh always creates duplicate. It seems to me this is a bug on this line: https://github.com/airbytehq/airbyte/blob/master/airbyte-integrations/connectors/source-salesforce/source_salesforce/streams.py#L476
We should not use >= to include the rows from previous batch
@paul-chrlt Hi, can you please elaborate with which streams you experiencing this problem, because from testing that I performed I was not able to reproduce this problem exactly. There is a PR that should add checkpointing to bulk streams (https://github.com/airbytehq/airbyte/pull/24888), which might resolve the issue you are having, but need to be sure if you still have this problem on latest version of Salesforce connector. Also I created a PR with fixes that @poolmaster proposed (https://github.com/airbytehq/airbyte/pull/24779), but need to be sure if this indeed covers the case with streams that you have problems with.
Hi @arsenlosenko We are experiencing this issue on streams with highest number of records:
The other streams have less than 10k records and we never had duplicates. We will try the latest version of Salesforce connector, I keep you informed.
@paul-chrlt Hi, thanks for clarification, we will let you know when the changes that should resolve this issue ((https://github.com/airbytehq/airbyte/pull/24888) are merged, so you could try to sync the streams in question again
Environment
Current Behavior
Duplicated rows are created since Salesforce connector update. Sync mode is set to Full Refresh | Overwrite
Expected Behavior
There were no duplicates using Salesforce 1.0.2 The only difference is the Salesforce connector update.
Steps to Reproduce