BerriAI / litellm

Python SDK, Proxy Server (LLM Gateway) to call 100+ LLM APIs in OpenAI format - [Bedrock, Azure, OpenAI, VertexAI, Cohere, Anthropic, Sagemaker, HuggingFace, Replicate, Groq]
https://docs.litellm.ai/docs/
Other
12.65k stars 1.47k forks source link

[Feature]: Proper Cost Tracking for Google Gemini Models #4165

Closed emerzon closed 3 months ago

emerzon commented 3 months ago

The Feature

Today, the cost for Gemini models at https://raw.githubusercontent.com/BerriAI/litellm/main/model_prices_and_context_window.json is marked per token.

There are 2 issues, however:

Would it make sense to have some specialized billing metrics that take in account exact number of chars and context size.

Motivation, pitch

For large usage, real value can drift considerably

Twitter / LinkedIn details

@emersongomesma

krrishdholakia commented 3 months ago

Cost per token is not suitable as Gemini bills per chars instead of tokens

yes we use the openai conversion of 1 char = 4 tokens 1 token = 4 char

What would make sense here @emerzon since vertex ai gives usage at a token level as well

krrishdholakia commented 3 months ago

we do have specialized cost calculation based on token / time for some providers - https://github.com/BerriAI/litellm/blob/2a3281796333e51563a8ba3a67b3d2c64cdd04f9/litellm/cost_calculator.py#L289

emerzon commented 3 months ago

Hi, I will seek clarification, the documentation from Google is contradictory, in some parts they mention billing per char and others per token. I will reopen the case late after this has been clarified. Thank you!

krrishdholakia commented 3 months ago

looks like vertex ai uses the same conversion of tokens to characters as us (1 char = 4 tokens)

Screenshot 2024-06-17 at 11 06 21 AM
emerzon commented 3 months ago

This 4 chars per token is a very rough approximation - specially for non-english languages this ratio can be very different.

I have written a small script to calculate the char to token ratios, both from the token count coming from the API and the one coming from tiktoken cl100k_base, just for comparison purposes.

Here's a sample run:

The moon, a silver coin, hangs high,
Upon a velvet, midnight sky.
A lullaby the wind does sigh,
As stars like diamonds softly lie. 

Language: english | Chars: 111 | Tokens: 37 | Ratio: 3.0 | TT Tokens: 34 | TT Token Ratio: 3.264705882352941
--------------------------------------------------------------------------------
O sol da tarde, em brasa,
Põe o céu em aquarela.
Nuvens cor de algodão doce,
Dançam sobre a mata bela.

Um canto de pássaro,
Na melodia do vento.
Um instante de paz,
No ritmo do momento. 

Language: portuguese | Chars: 158 | Tokens: 61 | Ratio: 2.5901639344262297 | TT Tokens: 68 | TT Token Ratio: 2.323529411764706
--------------------------------------------------------------------------------
Kylmä tuuli puhaltaa,
Lumikiteet tanssivat,
Yö on hiljainen ja tyhjä,
Vain tähdet loistavat. 

Language: finnish | Chars: 84 | Tokens: 40 | Ratio: 2.1 | TT Tokens: 45 | TT Token Ratio: 1.8666666666666667
--------------------------------------------------------------------------------
Le vent siffle, une mélodie d'automne,
Les feuilles dansent, une valse de rouge et d'or,
Le soleil se couche, une teinte de feu,
Et la nuit s'installe, paisible et sonore. 

Language: french | Chars: 146 | Tokens: 57 | Ratio: 2.56140350877193 | TT Tokens: 59 | TT Token Ratio: 2.4745762711864407
--------------------------------------------------------------------------------
春の雨
花々を潤す
優しい音 

Language: japanese | Chars: 15 | Tokens: 13 | Ratio: 1.1538461538461537 | TT Tokens: 23 | TT Token Ratio: 0.6521739130434783
--------------------------------------------------------------------------------
落叶飘零,
秋风萧瑟,
寒意渐浓,
夜色朦胧。 

Language: chinese | Chars: 24 | Tokens: 25 | Ratio: 0.96 | TT Tokens: 36 | TT Token Ratio: 0.6666666666666666
--------------------------------------------------------------------------------

The second part of the problem has to do with the different pricing tiers from Vertex for <128k and >128k prompts.

For an accurate calculation, the totalTokens response from API should be used only to determine the pricing tier, and totalBillableCharacters for actual price calculation.

An important side note is that is only applicable for Vertex AI, Google AI Studio uses the regular per-token billing (hence this was the reason for the initial confusion)

guillaq commented 3 months ago

Hi ! Maybe I am missing something, but the google doc says 1 token = 4 chars and not 1 char = 4 tokens. Which means that if the cost is $0.00125 / 1k characters, then it is $0.005 / 1k token no ?

krrishdholakia commented 3 months ago

Great catch @guillaq. Fixed with - #4291

guillaq commented 3 months ago

Thank you !

krrishdholakia commented 3 months ago

Can we setup a support channel @guillaq

would love to understand how you're using litellm today

Discord - link

LinkedIn - link

yannbu commented 3 months ago

Hi @krrishdholakia I'm Yann, @guillaq colleague.

We may have found another small error in the prices: https://github.com/BerriAI/litellm/issues/4305

Happy to chat to tell you more about our usage of the lib !