dotnet / machinelearning

ML.NET is an open source and cross-platform machine learning framework for .NET.
https://dot.net/ml
MIT License
9.05k stars 1.88k forks source link

Enable custom token counting #7263

Open luisquintanilla opened 1 month ago

luisquintanilla commented 1 month ago

Currently, Tokenizer returns counts strictly based on tokens.

However, there are scenarios where library authors may want / need to implement their own custom token counting function.

One such scenario is providing token counts for image inputs. In such cases, AI service providers provide an arbitrary way of calculating cost based on a fixed token count.

Provider Cost calculations Does tokenization Link
OpenAI Fixed No https://platform.openai.com/docs/guides/vision/calculating-costs
Claude Fixed No https://docs.anthropic.com/en/docs/build-with-claude/vision#calculate-image-costs
Gemini Fixed No https://ai.google.dev/gemini-api/docs/tokens?lang=python#multimodal-tokens
Cohere N/A N/A https://docs.cohere.com/docs/tokens
Mistral N/A N/A https://docs.mistral.ai/guides/tokenization/#tokens-count