Youssefbenhammouda / openai-ratelimiter

MIT License
8 stars 6 forks source link

Why are the tokens counted differently than OpenAI? #2

Open jucor opened 1 year ago

jucor commented 1 year ago

Hi @blaze-Youssef !

First, thanks for the tool, very useful, as I'm stuck in openlimit by similar issues that you met in https://github.com/shobrook/openlimit/issues/4 .

However, I'm a bit stuck trying to understand. What does the argument max_tokens correspond to, please, in https://github.com/blaze-Youssef/openai-ratelimiter/blob/main/openai_ratelimiter/defs.py#L9 ?

I am trying to understand it, but the way you count tokens, which is the same as in openlimit https://github.com/shobrook/openlimit/blob/master/openlimit/utilities/token_counters.py#L14 , is different than in OpenAI's cookbook https://github.com/openai/openai-cookbook/blob/main/examples/How_to_count_tokens_with_tiktoken.ipynb (see section 6).

Would you or @shobrook be able to help clarify this counting, please?

jucor commented 1 year ago

Ah! Found max_tokens in the OpenAI API, as an optional parameter: https://platform.openai.com/docs/api-reference/completions/create#completions/create-max_tokens Do I understand correctly that n*max_tokens` is here to preemptively count into the rate limit the maximum possible number of output tokens?

jucor commented 1 year ago

Ungh, now I see that OpenAI has a separate token counter in their batch-API script than in their tutorial Notebook: https://github.com/openai/openai-cookbook/blob/main/examples/api_request_parallel_processor.py#L339 The former does indeed take max_tokens into account. But is also less recent than their notebook, which has different token increments for roles' names. So I don't know which version of their token counter to believe 😅

Shouldn't a rate limiter, in any case, be updated after the actual completion is returned by the API, to account for the actual number of output tokens?

Youssefbenhammouda commented 1 year ago

Hi, I will dig into the OpenAI notebooks and update the implémentation if necessary. As far as I know, each request's tokens are calculated this way: Prompt tokens + Max tokens = request total tokens.

jucor commented 1 year ago

Thanks. It looks like there are two contradictory implementations from OpenAI: one in the notebook, the other in their batch call. They differ not just by the accounting for max_count, but also by how they handle the role names depending on the model (admittedly, a rather smaller factor!)

On Tue, Jul 25, 2023, 13:18 Youssef Benhammouda @.***> wrote:

Hi, I will dig into the OpenAI notebooks and update the implémentation if necessary. As far as I know, each request's tokens are calculated this way: Prompt tokens + Max tokens = request total tokens.

— Reply to this email directly, view it on GitHub https://github.com/blaze-Youssef/openai-ratelimiter/issues/2#issuecomment-1649730408, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAFBERK5LVHHK54RCKYFFSLXR62RTANCNFSM6AAAAAA2WHFFPA . You are receiving this because you authored the thread.Message ID: @.***>