airbytehq / airbyte

The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.
https://airbyte.com
Other
16.19k stars 4.14k forks source link

[source-salesforce] Account stream missing updates on incremental mode #42490

Open FredericoCoelhoNunes opened 3 months ago

FredericoCoelhoNunes commented 3 months ago

Connector Name

source-salesforce

Connector Version

2.5.23

What step the error happened?

During the sync

Relevant information

I have a Salesforce -> S3 connector configured. One of the streams that is being synced is the Account stream, in Incremental | Append mode, and runs hourly.

I discovered one account for which no updates were being picked up. This record was created on the 17/July, and had several updates on Salesforce between this day and the 21st. These updates are visible in the Salesforce interface and changelog, and the SystemModstamp on Salesforce is the 21st (date of the last update).

However, they were not present in the s3 data: there was only one row for this record, corresponding to the record exactly as it was upon being created (with SystemModstamp = 17th/July). After running a full refresh (i.e. clearing the data from the Account stream), the record on s3 updated successfully to its current state.

The connector runs are successfully syncing data every hour since the 17th until today (24th), with no failures in the recent past; runs at around the time of the last update were also successful, with a few dozen records being emitted.

I have also validated that not all accounts have this issue, as several accounts have their updates synced to s3.

Any insights as to what might be causing this issue would be greatly appreciated.

SystemModstamp on Salesforce: image

Successful syncs around the time of the update: image

Proof of the data not being updated on s3: unfortunately I forgot to take a screenshot before I ran the full-refresh, so I don't have an image to show this. But I queried the unmodified data directly on s3 (the direct output of the Airbyte syncs) by account ID, and only 1 row was available for this account, with an outdated SystemModstamp

Relevant log output

No response

Contribute

FredericoCoelhoNunes commented 2 months ago

Hi all, detected the issue again today with more accounts. Here are some screenshots showing the problem for account ID 001SZ00000HifiVYAR:

  1. the account is available on Salesforce, created and updated on 01/09 (had to strikethrough the personal data)

image

  1. Airbyte runs + configuration, as you can see the runs were successful on that date

image

image

  1. Querying the data in s3 (no intermediary modelling - this is a query on the actual Airbyte output) shows no results

image

  1. After manually triggering a full refresh of the accounts stream, the data correctly shows up:

image

image

PS.: we also noticed that the Opportunity matching this account also followed the pattern described above: it was not synced during the incremental loads and available after a full load.

@girarda I noticed you assigned some labels to this issue a couple of weeks ago, do you know if there is any update at all? Was an issue confirmed or could the problem be something obscure on our end? How can I help detect the issue?

Thank you 🙏