Open hdnh2006 opened 1 day ago
Confirm. Not even if we edit the model_prices_and_context_window.json
file directly and insert new models (if using docker you need to use a volume or update the container with the new file). What is more strange: Jamba (from AI21 Studio) is supported, recognized out-of-the-box by litellm but it still gives error:
{"message": "An error occurs - This model isn't mapped yet. model=jamba-1.5-mini, custom_llm_provider=ai21_chat. Add it here - https://github.com/BerriAI/litellm/blob/main/model_prices_and_context_window.json.", "level": "ERROR", "timestamp": "2024-10-17T16:14:59.436564"}
{"message": "An error occurs - This model isn't mapped yet. model=ai21-jamba-1.5-mini, custom_llm_provider=None. Add it here - https://github.com/BerriAI/litellm/blob/main/model_prices_and_context_window.json.", "level": "ERROR", "timestamp": "2024-10-17T16:14:59.437668"}
Even with a json models file containing:
{
"id": "jamba-1-5-large",
"name": "AI21: Jamba 1.5 Large",
"created": 1724371200,
"description": "Jamba 1.5 Large is part of AI21's new family of open models, offering superior speed, efficiency, and quality.\n\nIt features a 256K effective context window, the longest among open models, enabling improved performance on tasks like document summarization and analysis.\n\nBuilt on a novel SSM-Transformer architecture, it outperforms larger models like Llama 3.1 70B on benchmarks while maintaining resource efficiency.\n\nRead their [announcement](https://www.ai21.com/blog/announcing-jamba-model-family) to learn more.",
"context_length": 256000,
"architecture": {
"modality": "text->text",
"tokenizer": "Other",
"instruct_type": null
},
"pricing": {
"prompt": "0.000002",
"completion": "0.000008",
"image": "0",
"request": "0"
},
"top_provider": {
"context_length": 256000,
"max_completion_tokens": 4096,
"is_moderated": false
},
"per_request_limits": null
},
{
"id": "jamba-1-5-mini",
"name": "AI21: Jamba 1.5 Mini",
"created": 1724371200,
"description": "Jamba 1.5 Mini is the world's first production-grade Mamba-based model, combining SSM and Transformer architectures for a 256K context window and high efficiency.\n\nIt works with 9 languages and can handle various writing and analysis tasks as well as or better than similar small models.\n\nThis model uses less computer memory and works faster with longer texts than previous designs.\n\nRead their [announcement](https://www.ai21.com/blog/announcing-jamba-model-family) to learn more.",
"context_length": 256000,
"architecture": {
"modality": "text->text",
"tokenizer": "Other",
"instruct_type": null
},
"pricing": {
"prompt": "0.0000002",
"completion": "0.0000004",
"image": "0",
"request": "0"
},
"top_provider": {
"context_length": 256000,
"max_completion_tokens": 4096,
"is_moderated": false
},
"per_request_limits": null
},
{
"id": "ai21/jamba-1-5-large",
"name": "AI21: Jamba 1.5 Large",
"created": 1724371200,
"description": "Jamba 1.5 Large is part of AI21's new family of open models, offering superior speed, efficiency, and quality.\n\nIt features a 256K effective context window, the longest among open models, enabling improved performance on tasks like document summarization and analysis.\n\nBuilt on a novel SSM-Transformer architecture, it outperforms larger models like Llama 3.1 70B on benchmarks while maintaining resource efficiency.\n\nRead their [announcement](https://www.ai21.com/blog/announcing-jamba-model-family) to learn more.",
"context_length": 256000,
"architecture": {
"modality": "text->text",
"tokenizer": "Other",
"instruct_type": null
},
"pricing": {
"prompt": "0.000002",
"completion": "0.000008",
"image": "0",
"request": "0"
},
"top_provider": {
"context_length": 256000,
"max_completion_tokens": 4096,
"is_moderated": false
},
"per_request_limits": null
},
{
"id": "ai21/jamba-1-5-mini",
"name": "AI21: Jamba 1.5 Mini",
"created": 1724371200,
"description": "Jamba 1.5 Mini is the world's first production-grade Mamba-based model, combining SSM and Transformer architectures for a 256K context window and high efficiency.\n\nIt works with 9 languages and can handle various writing and analysis tasks as well as or better than similar small models.\n\nThis model uses less computer memory and works faster with longer texts than previous designs.\n\nRead their [announcement](https://www.ai21.com/blog/announcing-jamba-model-family) to learn more.",
"context_length": 256000,
"architecture": {
"modality": "text->text",
"tokenizer": "Other",
"instruct_type": null
},
"pricing": {
"prompt": "0.0000002",
"completion": "0.0000004",
"image": "0",
"request": "0"
},
"top_provider": {
"context_length": 256000,
"max_completion_tokens": 4096,
"is_moderated": false
},
"per_request_limits": null
},
{
"id": "ai21_chat/jamba-1-5-large",
"name": "AI21: Jamba 1.5 Large",
"created": 1724371200,
"description": "Jamba 1.5 Large is part of AI21's new family of open models, offering superior speed, efficiency, and quality.\n\nIt features a 256K effective context window, the longest among open models, enabling improved performance on tasks like document summarization and analysis.\n\nBuilt on a novel SSM-Transformer architecture, it outperforms larger models like Llama 3.1 70B on benchmarks while maintaining resource efficiency.\n\nRead their [announcement](https://www.ai21.com/blog/announcing-jamba-model-family) to learn more.",
"context_length": 256000,
"architecture": {
"modality": "text->text",
"tokenizer": "Other",
"instruct_type": null
},
"pricing": {
"prompt": "0.000002",
"completion": "0.000008",
"image": "0",
"request": "0"
},
"top_provider": {
"context_length": 256000,
"max_completion_tokens": 4096,
"is_moderated": false
},
"per_request_limits": null
},
{
"id": "ai21_chat/jamba-1-5-mini",
"name": "AI21: Jamba 1.5 Mini",
"created": 1724371200,
"description": "Jamba 1.5 Mini is the world's first production-grade Mamba-based model, combining SSM and Transformer architectures for a 256K context window and high efficiency.\n\nIt works with 9 languages and can handle various writing and analysis tasks as well as or better than similar small models.\n\nThis model uses less computer memory and works faster with longer texts than previous designs.\n\nRead their [announcement](https://www.ai21.com/blog/announcing-jamba-model-family) to learn more.",
"context_length": 256000,
"architecture": {
"modality": "text->text",
"tokenizer": "Other",
"instruct_type": null
},
"pricing": {
"prompt": "0.0000002",
"completion": "0.0000004",
"image": "0",
"request": "0"
},
"top_provider": {
"context_length": 256000,
"max_completion_tokens": 4096,
"is_moderated": false
},
"per_request_limits": null
},
{
"id": "ai21_studio/jamba-1-5-large",
"name": "AI21: Jamba 1.5 Large",
"created": 1724371200,
"description": "Jamba 1.5 Large is part of AI21's new family of open models, offering superior speed, efficiency, and quality.\n\nIt features a 256K effective context window, the longest among open models, enabling improved performance on tasks like document summarization and analysis.\n\nBuilt on a novel SSM-Transformer architecture, it outperforms larger models like Llama 3.1 70B on benchmarks while maintaining resource efficiency.\n\nRead their [announcement](https://www.ai21.com/blog/announcing-jamba-model-family) to learn more.",
"context_length": 256000,
"architecture": {
"modality": "text->text",
"tokenizer": "Other",
"instruct_type": null
},
"pricing": {
"prompt": "0.000002",
"completion": "0.000008",
"image": "0",
"request": "0"
},
"top_provider": {
"context_length": 256000,
"max_completion_tokens": 4096,
"is_moderated": false
},
"per_request_limits": null
},
{
"id": "ai21_studio/jamba-1-5-mini",
"name": "AI21: Jamba 1.5 Mini",
"created": 1724371200,
"description": "Jamba 1.5 Mini is the world's first production-grade Mamba-based model, combining SSM and Transformer architectures for a 256K context window and high efficiency.\n\nIt works with 9 languages and can handle various writing and analysis tasks as well as or better than similar small models.\n\nThis model uses less computer memory and works faster with longer texts than previous designs.\n\nRead their [announcement](https://www.ai21.com/blog/announcing-jamba-model-family) to learn more.",
"context_length": 256000,
"architecture": {
"modality": "text->text",
"tokenizer": "Other",
"instruct_type": null
},
"pricing": {
"prompt": "0.0000002",
"completion": "0.0000004",
"image": "0",
"request": "0"
},
"top_provider": {
"context_length": 256000,
"max_completion_tokens": 4096,
"is_moderated": false
},
"per_request_limits": null
},
Ok @bgeneto, I was becoming crazy with this issue.
The LiteLLM team is super pro, surely they will find a fast solution.
What happened?
Hello!
I am trying to use an embedding model from ollama, specifcally bge-large. I have registered the model in my litellm as you see in the following image:
However, I am unable to call it using a single post, I have tried this two ways:
Just
bge-large
with
ollama/bge-large
:I don't have problems if I call an OpenAI embedding model registered in my LiteLLM:
With OpenAI model
What I am doing wrong?
Relevant log output
Just
bge-large
With
ollama/bge-large
Twitter / LinkedIn details
No response