airbytehq / airbyte

The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.
https://airbyte.com
Other
15.28k stars 3.95k forks source link

πŸ› Source Stripe: missing data in Incremental sync mode #32976

Open kev-datams opened 9 months ago

kev-datams commented 9 months ago

Connector Name

source-stripe

Connector Version

5.0.1

What step the error happened?

During the sync

Relevant information

Hello Airbyte's community,

I identified an issue on source Stripe Charges stream using Incremental sync mode, leading to missing data in destination.

The concerned missing payments looks to systematically have:

NB: some other Blocked payments with filled failure message are properly retrieved.

Please find attached an example: Capture d’écran 2023-11-30 aΜ€ 11 07 14 Capture d’écran 2023-11-30 aΜ€ 11 11 12 Another common pattern is the 402 Stripe API Error. It could probably be an interesting path to follow... If this path is confirmed, it possibly impacts other streams also.

To be noticed:

@davydov-d as Stripe expert, you could probably help to investigate ? :) I would be happy to contribute also, but not sure from where to start... This is currently a blocking point on our side to consume Stripe data via Airbyte.

Thanks a lot for your help ! πŸ‘

EDIT: please check below comment with new elements to go further...

Relevant log output

No response

Contribute

kev-datams commented 9 months ago

Few new elements after deeper testing Full refresh vs Incremental: it seems to be an issue regarding pagination management in Incremental sync.

Considering a start date set at 2023-11-29T00:00:00Z (in config.json for Full refresh), equivalent to timestamp 1701216000 (in state.json for Incremental), I have below logs:

Full refresh

python main.py read --config secrets/config.json --catalog secrets/configured_catalog.json --state secrets/state.json | grep py_***my_id***

=> FOUND πŸ‘

python main.py read --config secrets/config.json --catalog secrets/configured_catalog.json --state secrets/state.json --debug | grep "https://api.stripe.com/v1/charges"

=> generates:

{"type": "DEBUG", "message": "Making outbound API request", "data": {"headers": "{'User-Agent': 'python-requests/2.28.2', 'Accept-Encoding': 'gzip, deflate', 'Accept': '*/*', 'Connection': 'keep-alive', 'Stripe-Version': '2022-11-15', 'Stripe-Account': 'xxx', 'Authorization': 'Bearer ****'}", "request_body": "None", "url": "https://api.stripe.com/v1/charges?created%5Bgte%5D=1701216000&created%5Blte%5D=1701302400&limit=100&expand%5B%5D=data.refunds"}}
{"type": "DEBUG", "message": "Making outbound API request", "data": {"headers": "{'User-Agent': 'python-requests/2.28.2', 'Accept-Encoding': 'gzip, deflate', 'Accept': '*/*', 'Connection': 'keep-alive', 'Stripe-Version': '2022-11-15', 'Stripe-Account': 'xxx', 'Authorization': 'Bearer ****'}", "request_body": "None", "url": "https://api.stripe.com/v1/charges?created%5Bgte%5D=1701302401&created%5Blte%5D=1701388801&limit=100&expand%5B%5D=data.refunds"}}
{"type": "DEBUG", "message": "Making outbound API request", "data": {"headers": "{'User-Agent': 'python-requests/2.28.2', 'Accept-Encoding': 'gzip, deflate', 'Accept': '*/*', 'Connection': 'keep-alive', 'Stripe-Version': '2022-11-15', 'Stripe-Account': 'xxx', 'Authorization': 'Bearer ****'}", "request_body": "None", "url": "https://api.stripe.com/v1/charges?created%5Bgte%5D=1701388802&created%5Blte%5D=1701452651&limit=100&expand%5B%5D=data.refunds"}}
{"type": "DEBUG", "message": "Making outbound API request", "data": {"headers": "{'User-Agent': 'python-requests/2.28.2', 'Accept-Encoding': 'gzip, deflate', 'Accept': '*/*', 'Connection': 'keep-alive', 'Stripe-Version': '2022-11-15', 'Stripe-Account': 'xxx', 'Authorization': 'Bearer ****'}", "request_body": "None", "url": "https://api.stripe.com/v1/charges?created%5Bgte%5D=1701216000&created%5Blte%5D=1701302400&limit=100&expand%5B%5D=data.refunds"}}
{"type": "DEBUG", "message": "Making outbound API request", "data": {"headers": "{'User-Agent': 'python-requests/2.28.2', 'Accept-Encoding': 'gzip, deflate', 'Accept': '*/*', 'Connection': 'keep-alive', 'Stripe-Version': '2022-11-15', 'Stripe-Account': 'xxx', 'Authorization': 'Bearer ****'}", "request_body": "None", "url": "https://api.stripe.com/v1/charges?created%5Bgte%5D=1701302401&created%5Blte%5D=1701388801&limit=100&expand%5B%5D=data.refunds&starting_after=py_3OI1uCCBk8J2diIe1k9P14b5"}}

=> pagination management looks fine πŸ‘

Incremental

python main.py read --config secrets/config.json --catalog secrets/configured_catalog.json --state secrets/state.json | grep py_***my_id***

=> NOT FOUND πŸ‘Ž

python main.py read --config secrets/config.json --catalog secrets/configured_catalog.json --state secrets/state.json --debug | grep "https://api.stripe.com/v1/charges"

=> generates:

{"type": "DEBUG", "message": "Making outbound API request", "data": {"headers": "{'User-Agent': 'python-requests/2.28.2', 'Accept-Encoding': 'gzip, deflate', 'Accept': '*/*', 'Connection': 'keep-alive', 'Stripe-Version': '2022-11-15', 'Stripe-Account': 'xxx', 'Authorization': 'Bearer ****'}", "url": "https://api.stripe.com/v1/charges?created%5Bgte%5D=1701216000&created%5Blte%5D=1701302400&limit=100&expand%5B%5D=data.refunds", "request_body": "None"}}

=> pagination management looks broken πŸ‘Ž

... or is it simply a different logging display in case of enabled debugging ?


Alternative: bug on Stripe ?

Other idea: any issue on Stripe side to log the event on charge creation following the PaymentIntent into the Events object, even if the charge really exists in the Charges object ?

=> to prevent from similar issues in a general way, in case of Incremental load, the connector would have to check into the Stream object (here Charges) for occurrences in the same temporal scope complementary to Events object ? (99% it will produce same objects duplicated, but 1% it would help to workaround the identified issue)

kev-datams commented 8 months ago

or perhaps @girarda as you seem to master Stripe connector πŸ™

Jgerardopine commented 6 months ago

Hi Kev-datams.

Thanks for your research on the bug and the key insights. I think you are right on the events. It seems the events does not exist. I am not sure if you were able to check the EVENTs endpoint, as we look into the different events to properly follow the changes and the insert the accordingly.
I will be taking over this issue and post my findings.

Jgerardopine commented 5 months ago

Hi kev-datams.

I have tried to recreate the behavior you are reporting. I used the Stripe sandbox to create multiple pages. I couldn't find any problems on the behavior of payment_intents as shown in your dashboard and all works as expected. What I did realize is you were testing with the Charges stream. According to Stripe, this stream is deprecated and payment_intents must be used instead:

image.png

So the events may not be catched all the time when creating a payment when it fails through this stream, but they for sure appear in the payment_intents. In my case, I checked both streams and I still see all the payment intents I tested (successful and failed). I even did a Block pyament just as the one you have and I see it in the event, payment_intent and charges streams:

image.png

Can you confirm if you can see the payment intent in your payment_intent stream?

In any case, since I have not been able to reproduce it, I will keep this ticket open for a week or so. If the behavior appears again, we can reopen the ticket.

lazebnyi commented 4 months ago

Hi @kev-datams Can you check comment above please