BerriAI / litellm

Python SDK, Proxy Server (LLM Gateway) to call 100+ LLM APIs in OpenAI format - [Bedrock, Azure, OpenAI, VertexAI, Cohere, Anthropic, Sagemaker, HuggingFace, Replicate, Groq]
https://docs.litellm.ai/docs/
Other
12.52k stars 1.46k forks source link

Is litellm.encode() accurate for Claude 3.5 Sonnet? #4347

Closed paul-gauthier closed 3 months ago

paul-gauthier commented 3 months ago

What happened?

I am getting user reports that Sonnet will sometimes stop generating tokens with an error indicating a token limit. Aider reports ~3k tokens have been output by the model, but Sonnet's output token limit is 4k. This has been hard to understand.

Aider uses litellm.encode() to count how many output tokens have been returned. Any chance it is using a wrong/approximate tokenizer and is therefore undercounting the tokens?

See https://github.com/paul-gauthier/aider/issues/705 for an example user report of this issue.

Relevant log output

No response

Twitter / LinkedIn details

No response

krrishdholakia commented 3 months ago

anthropic doesn't expose a tokenizer for claude-3, see: https://github.com/anthropics/anthropic-sdk-python/issues/375#issuecomment-1999982035

we're currently defaulting claude-3 to tiktoken - see here: https://github.com/BerriAI/litellm/blob/6b63b663b9de89139dd28203650f8443c39b6d9e/litellm/utils.py#L1479

I'd recommend having a buffer as it's possible there's a gap in accuracy

@paul-gauthier open to feedback on how we can do this better

paul-gauthier commented 3 months ago

Thanks for the reply. This is what I suspected, but wanted to confirm. Nothing much else to be done until Anthropic provides a tokenizer.

krrishdholakia commented 3 months ago

how're you planning on dealing with this? Wondering if there's anything we can do to help @paul-gauthier

paul-gauthier commented 3 months ago

One thing that might be nice is to provide a test that let's the caller know if the token counts are accurate or approximate.

Maybe something like this?

accurate = litellm.encode_is_accurate(model)
# True if tokenizer is known to be correct
# False if using a "best effort" approximate tokenizer