[Feature]: Accurate token count for claude-3 streaming models

BerriAI / litellm

Call all LLM APIs using the OpenAI format. Use Bedrock, Azure, OpenAI, Cohere, Anthropic, Ollama, Sagemaker, HuggingFace, Replicate (100+ LLMs)

https://docs.litellm.ai/docs/

Other

10.28k stars 1.15k forks source link

[Feature]: Accurate token count for claude-3 streaming models #2417

Open atishay-sarvam opened 3 months ago

atishay-sarvam commented 3 months ago

The Feature

Hi!

The tokenizer you are using for claude-3 is not accurate, the correct numbers are output in the chunks (first chunk for prompt token and last chunk for response token. Proposal is to use those numbers instead.

Thanks

Motivation, pitch

https://docs.anthropic.com/claude/reference/messages-streaming

I would like to get the number of tokens used by the model based on the actual numbers instead of an estimate from another tokenizer (Claude's tokenizer has changed between v2 and v3.

Twitter / LinkedIn details

No response

krrishdholakia commented 3 months ago

Hey @atishay-sarvam great idea.

How do you want to receive the tokens for streaming? (is this via the custom logger?)

krrishdholakia commented 3 months ago

Looking at this issue - https://github.com/anthropics/anthropic-sdk-python/issues/353 + the client sdk file

and they no longer have their tokenizer available for use. I think we can solve this pretty easily for anthropic streaming calls w/ custom logger -> can save the values and use while rebuilding the message to emit for log_success_event

Would that work? @atishay-sarvam

atishay-sarvam commented 3 months ago

Sure sounds good, though alternatively you can propagate the tokens directly in the respective message.

Currently the way anthropic is sharing tokens is in the first message for input tokens and the last message for output tokens. If you allowed that as a passthrough, would be helpful.

I am not using the custom logger.