airbytehq / airbyte

The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.
https://airbyte.com
Other
15.77k stars 4.04k forks source link

Source Stripe: Expandable fields missing in Events API may lead to incomplete data #38039

Open strosek opened 5 months ago

strosek commented 5 months ago

Current situation

The Stripe API doesn't support retrieving expandable fields through the Events API. This makes it impossible to construct the full data in incremental sync without retrieving the full objects. This may imply that existing data is updated with null when writing in the destination. One example of this expandable fields in the API is the Price.tiers object.

Reproducing the issue

Create a new pricing object with billing_scheme = tiered, and fill the required fields in the Tiers object. Then query for events related to the newly created Price. Results don't include tiers information.

Desired situation

Connector logic to retrieve updated expandable fields in incremental sync is implemented. We update multiple objects from a single API call if possible for performance. We make sure expandable fields don't override values with null in destinations when not present.

Proposed solution

We keep track of which fields are expandable throughout the API. Objects with expandable data that also have update events are queried to populate all fields when doing incremental sync.

Expected outcome

Data is complete and up-to-date even when expandable data is present in API objects. Test end to end to check no wrong nulls are written. As we will perform one HTTP query per record, we expect this to slow the syncs but don't have a good measure of how much.

maxi297 commented 5 months ago

Based on this information, it is assumed that the following fields are missing during incremental syncs: