Open drobiazko opened 9 months ago
@drobiazko I think i can able to see those headers now. Can you check it ?
This is still an issue, the only headers that are available are x-ratelimit-remaining-requests
and x-ratelimit-remaining-tokens
having the rest of them is crucial for implementing good client side rate-limiting.
We'd be interested in this as well for the same reasons as @yuma-brendan.
The response headers with the two values above are flaky. For our dedicated endpoints, we don't get the x-ratelimit-remaining-tokens all the time. It is rare that we ever get that header. This thread is quite old. Are there plans to add the other OpenAI standard response headers and also be consistent across endpoints shared/dedicated?
At the moment the only way to handle TPM and RPM rate limits is to receive a HTTP 429 error response and wait until the limit is reset.
A better way to handle the rate limits would be to prevent HTTP 429 errors proactively. For example, OpenAI API is exposing rate limit headers in each response. These headers can be used to decide about a delay between requests. For example, the smaller the value in the
x-ratelimit-remaining-requests
header the longer the delay would be.Here is the list of all OpenAI rate limits in headers. Is it possible to add them to the Azure OpenAI API?