BerriAI / litellm

Call all LLM APIs using the OpenAI format. Use Bedrock, Azure, OpenAI, Cohere, Anthropic, Ollama, Sagemaker, HuggingFace, Replicate (100+ LLMs)
https://docs.litellm.ai/docs/
Other
10.16k stars 1.13k forks source link

[Bug]: Completion tokens are incorrectly calculated #4439

Closed JosephDavidsonKSWH closed 3 days ago

JosephDavidsonKSWH commented 4 days ago

What happened?

What happens:

For a hypothetical gpt-4o call with an input size of 4000 tokens:

litellm.completion(model="gpt-4o", messages=messages,max_tokens=4096)

The completion size will be ~96 tokens long.

Detailed Description

The max_tokens parameter in completion() is stated to be the maximum number of completion tokens requested. However starting at litellm/utils.py:821 the calculation of max_tokens for the api call doesn't reflect this behaviour.

In utils.py, max_output_tokens is obtained from get_max_tokens() which is correct, but later in the calculation it is treated more like max_context_size, because the size of the user input is subtracted from it before the call api call (lines 842-843).

elif user_max_tokens + input_tokens > max_output_tokens:
    user_max_tokens = max_output_tokens - input_tokens

Ironically, if the user input is larger than max_tokens, then the call often completes correctly, because that calculation doesn't take place and the call is assumed to fail (840-841):

if input_tokens > max_output_tokens:
    pass  # allow call to fail normally

Relevant log output

No response

Twitter / LinkedIn details

No response

krrishdholakia commented 3 days ago

Hey @JosephDavidsonKSWH fix here - https://github.com/BerriAI/litellm/pull/4446

Curious - are you using the proxy or sdk here?