Open krrishdholakia opened 7 months ago
At the risk of greatly changing the scope of this bug report, I'd like to mention that the _select_tokenizer method only selects the correct tokenizer for the few hard coded model classes (namely Cohere, Anthropic, LLaMA2, OpenAI) and for every other model (including Mistral) it incorrectly defaults to tiktoken.
open to suggestions - how can we improve this @GlavitsBalazs
As a fallback, I would add a Tokenizer.from_pretrained
call inside of a try-except to _select_tokenizer
to check if the model is available on HuggingFace and use the HF tokenizer if nothing better is available. This would work with huggingface/mistralai/Mixtral-8x7B-Instruct-v0.1
or togetherai/mistralai/Mixtral-8x7B-Instruct-v0.1
for example.
For models such as bedrock/mistral.mixtral-8x7b-instruct-v0:1
, there is no way to automatically deduce the correct tokenizer, so I would set litellm.model_cost["bedrock/mistral.mixtral-8x7b-instruct-v0:1"]["huggingface_tokenizer"] = "mistralai/Mixtral-8x7B-Instruct-v0.1"
via an entry in model_prices_and_context_window.json
. This could be done for many other models, such as Cohere, Mistral, LLaMA, etc. Other options for the name `huggingface_tokenizer"
could be "huggingface_tokenizer_name"
, "huggingface_tokenizer_repo"
, or "huggingface_tokenizer_model_name_or_path"
.
Then _select_tokenizer
would call litellm.get_model_info
and if the info has a "huggingface_tokenizer"
we can fetch that and be sure that it's correct. Users could even customize this via litellm.register_model
.
Finally, I would add a functools.lru_cache
decorator to _select_tokenizer
, so that we don't have to load the tokenizer from disk or send network requests every time someone wants to tokenize something.
What happened?
Request to litellm: litellm.completion(model='bedrock/mistral.mixtral-8x7b-instruct-v0:1', messages=[{'role': 'user', 'content': 'Are you here? Answer "Yes."'}], max_tokens=3, stream=True)
Relevant log output
No response
Twitter / LinkedIn details
cc: @GlavitsBalazs