dlt-hub / dlt

data load tool (dlt) is an open source Python library that makes data loading easy 🛠️
https://dlthub.com/docs
Apache License 2.0
2.32k stars 151 forks source link

Add 'retry' response action for generic rest api source #1786

Open alex-fedorenk0 opened 2 weeks ago

alex-fedorenk0 commented 2 weeks ago

Feature description

Extend the default response actions with request retry based on response content.

Are you a dlt user?

I'd consider using dlt, but it's lacking a feature I need.

Use case

I am considering dlt with declarative rest api source as a replacement for custom ingestion script. The API I am using can return incomplete pages of data having response status 200, but 'error' key in response content. Currently this leads to missing data at destination. In my case incremental cursor value is read from last page so previous incomplete pages are never updated.

Proposed solution

Add 'retry' response action based on response content which will trigger retry of rest api request. The error key can be looked up as content substring, but it would be great to have an ability to check for actual keys. As an example, in my case it is "status":"failure" and "error":{"message": ..., "details": ...}. I understand that I use custom Client instance for this use case, but it seems a good addition to declarative source.

Related issues

No response

burnash commented 1 week ago

Hey @alex-fedorenk0, thanks for suggestion. How does the API you're calling work on the retries? Is it returning a complete page on a subsequent call?

alex-fedorenk0 commented 1 week ago

It may return the complete page on the next attempt, or fail again depending on server load. Idk if this behavior (returning http 200 with error key and incomplete data) is common enough. In the custom client we raise tenacity's TryAgain exception if 'error' key is present in response content or total record count is not equal to expected count. For us it is better to fail the extraction completely and keep the existing state until the next schedule run then to get incomplete pages of data.