Azure / azure-sdk-for-rust

This repository is for the active development of the Azure SDK for Rust. For consumers of the SDK we recommend visiting Docs.rs and looking up the docs for any of libraries in the SDK.
MIT License
713 stars 248 forks source link

Check for `Retry-After` in `RetryPolicy` in the case of `429` status code #618

Open yoshuawuyts opened 2 years ago

yoshuawuyts commented 2 years ago

As per the MDN page on HTTP Status Code 429, when a 429 status is sent it may include a Retry-After HTTP header which specifies how long to wait before trying again. Currently our RetryPolicy does not check for this header, and we should make it so it does.

gorzell commented 2 years ago

This should be pretty straight forward to fix or look at. I think the main question is how it should interact with the other settings of the policy. i.e. what if it is longer than max_delay or max_elapsed? I would think you would just take the min?

It might also need to be added to the trait, because the function with access to the headers isn't what picks the duration between retries.

gorzell commented 2 years ago

Do we know if Azure is actually including this header? I have never seen it, but we also usually get ServerBusy | Service Unavailable (503). I also don't see 429 in the listed common or blob store error codes. Maybe they are used by some of the other services?

cataggar commented 2 years ago

You can see which services are using 429 by looking at the specs: https://cs.github.com/?scopeName=All+repos&scope=&q=%22429%22%3A+repo%3AAzure%2Fazure-rest-api-specs++language%3AJSON

gorzell commented 2 years ago

And since @cataggar pointed out this is documented in code, the header is indeed actually used sometimes: https://cs.github.com/Azure/azure-rest-api-specs?q=%2F%22429%22%3A.*%5Cn.*headers.*%5Cn.*Retry%2F+repo%3AAzure%2Fazure-rest-api-specs++language%3AJSON

So back to the question of what do you do when it conflicts with the policy.

rylev commented 2 years ago

It seems like the specific delay between retries should be the max of the Retry-After time and the policies specified max_delay. Essentially max_delay is a best attempt unless the service asks for more delay. And once the max_elapsed is reached even if the delay ended up being longer than max_delay we should no longer retry.