Expose rate limits in headers to all endpoints of AzureOpenAI API

Azure / azure-rest-api-specs

The source for REST API specifications for Microsoft Azure.

MIT License

2.61k stars 5.03k forks source link

Expose rate limits in headers to all endpoints of AzureOpenAI API #26884

Open drobiazko opened 9 months ago

drobiazko commented 9 months ago

At the moment the only way to handle TPM and RPM rate limits is to receive a HTTP 429 error response and wait until the limit is reset.

A better way to handle the rate limits would be to prevent HTTP 429 errors proactively. For example, OpenAI API is exposing rate limit headers in each response. These headers can be used to decide about a delay between requests. For example, the smaller the value in the x-ratelimit-remaining-requests header the longer the delay would be.

Here is the list of all OpenAI rate limits in headers. Is it possible to add them to the Azure OpenAI API?

Rate limits - OpenAI API 2023-11-29 14-25-10

jayendranarumugam commented 8 months ago

@drobiazko I think i can able to see those headers now. Can you check it ?

yuma-brendan commented 4 months ago

This is still an issue, the only headers that are available are x-ratelimit-remaining-requests and x-ratelimit-remaining-tokens having the rest of them is crucial for implementing good client side rate-limiting.

mlesniak commented 4 months ago

We'd be interested in this as well for the same reasons as @yuma-brendan.

scottberres-tr commented 3 months ago

The response headers with the two values above are flaky. For our dedicated endpoints, we don't get the x-ratelimit-remaining-tokens all the time. It is rare that we ever get that header. This thread is quite old. Are there plans to add the other OpenAI standard response headers and also be consistent across endpoints shared/dedicated?