[Bug]: Completion tokens are incorrectly calculated

What happened?

What happens:

For a hypothetical gpt-4o call with an input size of 4000 tokens:

litellm.completion(model="gpt-4o", messages=messages,max_tokens=4096)

The completion size will be ~96 tokens long.

Detailed Description

The max_tokens parameter in completion() is stated to be the maximum number of completion tokens requested. However starting at litellm/utils.py:821 the calculation of max_tokens for the api call doesn't reflect this behaviour.

In utils.py, max_output_tokens is obtained from get_max_tokens() which is correct, but later in the calculation it is treated more like max_context_size, because the size of the user input is subtracted from it before the call api call (lines 842-843).

elif user_max_tokens + input_tokens > max_output_tokens:
    user_max_tokens = max_output_tokens - input_tokens

Ironically, if the user input is larger than max_tokens, then the call often completes correctly, because that calculation doesn't take place and the call is assumed to fail (840-841):

if input_tokens > max_output_tokens:
    pass  # allow call to fail normally

Relevant log output

No response

Twitter / LinkedIn details

No response

BerriAI / litellm