Access to rate limit headers

Issue Description

Currently, the OpenAI API responds with several headers that provide valuable information about rate limits. This includes headers like x-ratelimit-limit-requests, x-ratelimit-remaining-requests, and similar ones for tokens, which are crucial for users to understand when and why they are being rate-limited (as detailed in OpenAI's rate limits documentation).

FIELD | SAMPLE VALUE | DESCRIPTION
-- | -- | --
x-ratelimit-limit-requests | 60 | The maximum number of requests that are permitted before exhausting the rate limit.
x-ratelimit-limit-tokens | 150000 | The maximum number of tokens that are permitted before exhausting the rate limit.
x-ratelimit-remaining-requests | 59 | The remaining number of requests that are permitted before exhausting the rate limit.
x-ratelimit-remaining-tokens | 149984 | The remaining number of tokens that are permitted before exhausting the rate limit.
x-ratelimit-reset-requests | 1s | The time until the rate limit (based on requests) resets to its initial state.
x-ratelimit-reset-tokens | 6m0s | The time until the rate limit (based on tokens) resets to its initial state.

However, the challenge lies in accessing these headers. Presently, all methods in the API return only the body of OpenAI's response, which means users cannot access these headers directly unless they resort to monkey-patching. This limitation can be particularly challenging for users who utilize OpenAI at scale. For users that implement retry mechanisms or want to understand why and when are they being rate-limited, access to these headers would be significantly beneficial.

Proposed Solution

To address this, I suggest we consider exposing the response headers in some manner.

Implementation Ideas

Two potential implementation approaches come to mind, though I'm open to additional suggestions:

Middleware Implementation: We could introduce middleware that allows users to inspect the response's data, including headers before the body is returned. This approach has the advantage of extending the gem’s functionality while maintaining simplicity.
Response Format Modification: Another option is to make a breaking change to the gem’s response format. This change would include not just the response body but also the headers in the returned data.

I believe either of these solutions would significantly improve the user experience by providing more control and insight into the API's rate-limiting mechanisms.

By the way, thanks so much for this 💎

alexrudall / ruby-openai

Access to rate limit headers #419