airbytehq / airbyte

The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.
https://airbyte.com
Other
16.07k stars 4.11k forks source link

Source Salesforce: Extract of history object with incremental append on CreatedDate is duplicating rows #29455

Closed chancekbarkley closed 6 months ago

chancekbarkley commented 1 year ago

Connector Name

source-salesforce

Connector Version

2.1.0 on platform version 0.50.7

What step the error happened?

Other

Revelant information

When syncing some of the History objects from Salesforce to Snowflake, the output seems to be duplicating rows.

We have tried to "reset" the data and resync without luck.

In the screenshot, the ID field should be unique per row. As you can see the airbyte hash also isn't unique. The replication settings are configured for Incremental Append based on Created Date, but the created date is the same across all 3 records. image

It is worth noting that the manually triggered first sync failed, the automatic retry succeeded, and the scheduled sync also succeeded, and there is a record row for a single history record corresponding to each of the 3 sync jobs.

I will attempt to update the source and target connector versions now and reset the tables again tonight to see if it makes any difference.

Relevant log output

No response

Contribute

chancekbarkley commented 1 year ago

After upgrading both connectors, performing a reset on this job, and running it twice, there are at least 8 objects with at least 1 duplicated salesforce ID record where the created date on both is the same as well as the airbyte hash.

My team could possibly contribute to the fix if you can help us determine the best place to start looking.

maxi297 commented 6 months ago

Hi @chancekbarkley ! It is expected that the source might produce duplicate records. If you want to make sure that there is no duplication, incremental Append + Deduped is the sync mode you would like to use. Is there a reason you wouldn't be able to use that mode? For now, we will close the issue and we can re-open it with the added information.