alexrudall / ruby-openai

OpenAI API + Ruby! 🤖❤️ NEW: Assistant Vector Stores
MIT License
2.73k stars 321 forks source link

Access to rate limit headers #419

Closed martinjaimem closed 8 months ago

martinjaimem commented 8 months ago

Issue Description

Currently, the OpenAI API responds with several headers that provide valuable information about rate limits. This includes headers like x-ratelimit-limit-requests, x-ratelimit-remaining-requests, and similar ones for tokens, which are crucial for users to understand when and why they are being rate-limited (as detailed in OpenAI's rate limits documentation).

FIELD | SAMPLE VALUE | DESCRIPTION
-- | -- | --
x-ratelimit-limit-requests | 60 | The maximum number of requests that are permitted before exhausting the rate limit.
x-ratelimit-limit-tokens | 150000 | The maximum number of tokens that are permitted before exhausting the rate limit.
x-ratelimit-remaining-requests | 59 | The remaining number of requests that are permitted before exhausting the rate limit.
x-ratelimit-remaining-tokens | 149984 | The remaining number of tokens that are permitted before exhausting the rate limit.
x-ratelimit-reset-requests | 1s | The time until the rate limit (based on requests) resets to its initial state.
x-ratelimit-reset-tokens | 6m0s | The time until the rate limit (based on tokens) resets to its initial state.

However, the challenge lies in accessing these headers. Presently, all methods in the API return only the body of OpenAI's response, which means users cannot access these headers directly unless they resort to monkey-patching. This limitation can be particularly challenging for users who utilize OpenAI at scale. For users that implement retry mechanisms or want to understand why and when are they being rate-limited, access to these headers would be significantly beneficial.

Proposed Solution

To address this, I suggest we consider exposing the response headers in some manner.

Implementation Ideas

Two potential implementation approaches come to mind, though I'm open to additional suggestions:

I believe either of these solutions would significantly improve the user experience by providing more control and insight into the API's rate-limiting mechanisms.


By the way, thanks so much for this 💎