SAP / cloud-sdk-js

Use the SAP Cloud SDK for JavaScript / TypeScript to reduce development effort when building applications on SAP Business Technology Platform that communicate with SAP solutions and services such as SAP S/4HANA Cloud, SAP SuccessFactors, and many others.
Apache License 2.0
161 stars 53 forks source link

Resilience retry too restrictive on HTTP codes (rate limited API's returning 429) #4726

Closed geert-janklaps closed 3 weeks ago

geert-janklaps commented 1 month ago

Describe the bug

Currently the retry resilience middleware excludes all HTTP codes starting with 4 from actually retrying the request. This is perfectly correct for error like 404, 403, 401... But there's at least one exception HTTP 429 (Too many requests). This is where a retry middleware can be used to slow down the requests by retrying after a timeout.

By adapting following line this exception can be added: https://github.com/SAP/cloud-sdk-js/blob/b96c94c6db1b750a7c07e929da0d1edb93adb2e6/packages/resilience/src/retry.ts#L48

Example: } else if (status.toString().startsWith('4') && status.toString() !== '429') {

Expected behavior Expecting requests with a response code 429 to be retried.

marikaner commented 1 month ago

Hey @geert-janklaps, "too many requests" to me does not sound like an invitation to retry even more ;) But I understand your idea of slowing down requests and trying it after more time passes. However, please consider that even though the retries are executed with an increasing threshold, the first retries are executed within seconds putting more load on the system.

I believe this is not a good default for most use cases, but you could build your own retry middleware and use that instead.

geert-janklaps commented 1 month ago

Hi @marikaner,

I do get your point, on the other hand documentation for API's with these limits suggest reducing the number of requests within the timeframe: image

In this case I'm looking at the SAP Build Process Automation workflow API, which has a rate limit of 150 requests per second. (which our solution clearly doesn't consume, we're at about 2 requests / second, but maybe something else is putting a high load on the API at that time)

A retry mechanism with an exponentially growing timeout perfectly solves the issue for these types of cases.

Anyway, I did implement a custom retry middleware (basically a copy of the one provided by the SAP Cloud SDK) and adapted as described above. So I did solve the issue in my solution, but thought I'd mention this here as well so this can either be implemented in the standard framework or people running into a similar issue can find this solution in the issues and implement something themselves.

If you don't see the need to get this in the standard cloud sdk, feel free to close this issue.

Cheers, Geert-Jan