Closed Elijas closed 7 months ago
Hi,
The max tokens parameter does not affect the functionality of the package, it is used to calculate the number of the tokens of the request, this algorithm was taken from a notebook that was published earlier by OpenAI. It depends on how much you expect the output to be.
Please note that I didn't update the package for months, and it may be that OpenAI changed the way how this is calculated but the package can still be helpful to limit the request but not precisely.
So I suggest you test if the package is behaving like you are expecting first :)
Hi,
The max tokens parameter does not affect the functionality of the package, it is used to calculate the number of the tokens of the request, this algorithm was taken from a notebook that was published earlier by OpenAI. It depends on how much you expect the output to be.
Please note that I didn't update the package for months, and it may be that OpenAI changed the way how this is calculated but the package can still be helpful to limit the request but not precisely.
So I suggest you test if the package is behaving like you are expecting first :)
Thanks for a quick reply!
I see, so from what I understand max_tokens is basically "guess how many output tokens the model will generate and this will count towards the token limit".
So for example, if OpenAI is limited to 200 output tokens, then setting max_tokens to 200 would account for the worst possible case.
Ideally, after the generation is complete, that value (200) would be replaced in the limiter with the actual output token count, I suppose. I know that OpenAI also gives "used tokens" and "available tokens left" in the response headers, but not sure if other LLM vendors do this too, so tracking the tokens on client-side allows for more portable code.
Either way, thanks for the library, provides a quick way to get started 👍 🚀
incrby
This means that such an approach should theoretically be possible
actual - reserved
amount of tokens. If the value is positive then additional tokens are taken into account. If the value is negative, then unused tokens are freed.E.g. (a very rough hacky example)
reserved_tokens = 300
with limiter.limit(messages=..., max_tokens=reserved_tokens)
response = ... # make request
# Adjust reserved tokens based on actual consumption
actual_output_tokens = get_output_token_count(response)
adjustment = actual_output_tokens - reserved_tokens
await limiter.redis.incrby(f"{limiter.model_name}_api_tokens", adjustment)
Of course, this solution has plenty of unaccounted edge cases, this was just a back-of-a-napkin example
--
just a few raw thoughts regarding the limiter 👍
I will add this for future reference. The following draft-code seems to work great so far, it only adjusts the actual consumption value if it didn't expire yet (as a transaction to avoid race conditions).
reserved_tokens = 300
with limiter.limit(messages=..., max_tokens=reserved_tokens)
response = ... # make request
result = response.choices[0].message.content
actual_tokens = len(tiktoken.get_encoding("cl100k_base").encode(result))
adjustment = actual_tokens - reserved_tokens
used_tokens = await self.incr_if_exists(adjustment)
async def incr_if_exists(self, adjustment: int) -> str:
lua_script = """
if redis.call('exists', KEYS[1]) == 1 then
return redis.call('incrby', KEYS[1], ARGV[1])
else
return nil
end
"""
key = f"{limiter.model_name}_api_tokens"
adjustment_value = adjustment
return await self._limiter.redis.eval(
lua_script, 1, key, str(adjustment_value)
)
(Sorry for the code quality, It's was a very quick-n-dirty draft)
Hello,
Firstly, I noticed a discrepancy in the max_tokens value—15 in the source code versus 175 in the documentation examples. Could you advise on how this value is derived?
Thank you for your support.