BerriAI / litellm

Python SDK, Proxy Server (LLM Gateway) to call 100+ LLM APIs in OpenAI format - [Bedrock, Azure, OpenAI, VertexAI, Cohere, Anthropic, Sagemaker, HuggingFace, Replicate, Groq]
https://docs.litellm.ai/docs/
Other
12.99k stars 1.52k forks source link

[Bug]: LiteLLM is not recognizing the provider or model #6287

Open hdnh2006 opened 1 day ago

hdnh2006 commented 1 day ago

What happened?

Hello!

I am trying to use an embedding model from ollama, specifcally bge-large. I have registered the model in my litellm as you see in the following image: image

However, I am unable to call it using a single post, I have tried this two ways:

Just bge-large

import requests
import json

# Set API endpoint URL
url = "https://mylitellmurl/embeddings"

# Set API authentication token
auth_token = "sk-Myauthtoken"

# Set API request headers
headers = {
    "Authorization": f"Bearer {auth_token}",
    "Content-Type": "application/json"
}

payload = {
    "model": "bge-large",
    "input": ["why is the sky blue?", "why is the grass green?"]
}

# Convert payload to JSON string
payload_json = json.dumps(payload)

# Make POST request to API endpoint
response = requests.post(url, headers=headers, data=payload_json)

# Check if request was successful
if response.status_code == 200:
    print("Request successful!")
    print(response.json())
else:
    print("Request failed with error:", response.text)

...
Request failed with error: {"error":{"message":"litellm.BadRequestError: LLM Provider NOT provided. Pass in the LLM provider you are trying to call. You passed model=bge-large\n Pass model as E.g. For 'Huggingface' inference endpoints pass in `completion(model='huggingface/starcoder',..)` Learn more: https://docs.litellm.ai/docs/providers\nReceived Model Group=bge-large\nAvailable Model Group Fallbacks=None","type":null,"param":null,"code":"400"}}

with ollama/bge-large:

import requests
import json

# Set API endpoint URL
url = "https://mylitellmurl/embeddings"

# Set API authentication token
auth_token = "sk-Myauthtoken"

# Set API request headers
headers = {
    "Authorization": f"Bearer {auth_token}",
    "Content-Type": "application/json"
}

payload = {
    "model": "ollama/bge-large",
    "input": ["why is the sky blue?", "why is the grass green?"]
}

# Convert payload to JSON string
payload_json = json.dumps(payload)

# Make POST request to API endpoint
response = requests.post(url, headers=headers, data=payload_json)

# Check if request was successful
if response.status_code == 200:
    print("Request successful!")
    print(response.json())
else:
    print("Request failed with error:", response.text)

Request failed with error: {"error":{"message":"No deployments available for selected model. You passed in model=ollama/bge-large. There is no 'model_name' with this string ","type":"None","param":"None","code":"429"}}

I don't have problems if I call an OpenAI embedding model registered in my LiteLLM: image

With OpenAI model


import requests
import json

# Set API endpoint URL
url = "https://mylitellmurl/embeddings"

# Set API authentication token
auth_token = "sk-Myauthtoken"

# Set API request headers
headers = {
    "Authorization": f"Bearer {auth_token}",
    "Content-Type": "application/json"
}

payload = {
    "model": "text-embedding-ada-002",
    "input": ["why is the sky blue?", "why is the grass green?"]
}

# Convert payload to JSON string
payload_json = json.dumps(payload)

# Make POST request to API endpoint
response = requests.post(url, headers=headers, data=payload_json)

# Check if request was successful
if response.status_code == 200:
    print("Request successful!")
    print(response.json())
else:
    print("Request failed with error:", response.text)

Request successful!
{'model': 'text-embedding-ada-002', 'data': [{'embedding': [0.030145293101668358, -0.005780410952866077, -0.0027748537249863148, -0.03870696574449539, -0.03573345020413399, 0.014431802555918694, ...
], 'index': 0, 'object': 'embedding'}, {'embedding': [0.02898677997291088, -0.014583651907742023, 0.01015439908951521, -0.030766217038035393, -0.017897531390190125, 0.018129631876945496, -0.022578226402401924, ... 
], 'index': 1, 'object': 'embedding'}], 'object': 'list', 'usage': {'completion_tokens': 0, 'prompt_tokens': 12, 'total_tokens': 12, 'completion_tokens_details': None}}

What I am doing wrong?

Relevant log output

Just bge-large

Request failed with error: {"error":{"message":"litellm.BadRequestError: LLM Provider NOT provided. Pass in the LLM provider you are trying to call. You passed model=bge-large\n Pass model as E.g. For 'Huggingface' inference endpoints pass in `completion(model='huggingface/starcoder',..)` Learn more: https://docs.litellm.ai/docs/providers\nReceived Model Group=bge-large\nAvailable Model Group Fallbacks=None","type":null,"param":null,"code":"400"}}

With ollama/bge-large

Request failed with error: {"error":{"message":"No deployments available for selected model. You passed in model=ollama/bge-large. There is no 'model_name' with this string ","type":"None","param":"None","code":"429"}}

Twitter / LinkedIn details

No response

bgeneto commented 1 day ago

Confirm. Not even if we edit the model_prices_and_context_window.json file directly and insert new models (if using docker you need to use a volume or update the container with the new file). What is more strange: Jamba (from AI21 Studio) is supported, recognized out-of-the-box by litellm but it still gives error:

{"message": "An error occurs - This model isn't mapped yet. model=jamba-1.5-mini, custom_llm_provider=ai21_chat. Add it here - https://github.com/BerriAI/litellm/blob/main/model_prices_and_context_window.json.", "level": "ERROR", "timestamp": "2024-10-17T16:14:59.436564"}
{"message": "An error occurs - This model isn't mapped yet. model=ai21-jamba-1.5-mini, custom_llm_provider=None. Add it here - https://github.com/BerriAI/litellm/blob/main/model_prices_and_context_window.json.", "level": "ERROR", "timestamp": "2024-10-17T16:14:59.437668"}

Even with a json models file containing:

     {
      "id": "jamba-1-5-large",
      "name": "AI21: Jamba 1.5 Large",
      "created": 1724371200,
      "description": "Jamba 1.5 Large is part of AI21's new family of open models, offering superior speed, efficiency, and quality.\n\nIt features a 256K effective context window, the longest among open models, enabling improved performance on tasks like document summarization and analysis.\n\nBuilt on a novel SSM-Transformer architecture, it outperforms larger models like Llama 3.1 70B on benchmarks while maintaining resource efficiency.\n\nRead their [announcement](https://www.ai21.com/blog/announcing-jamba-model-family) to learn more.",
      "context_length": 256000,
      "architecture": {
        "modality": "text->text",
        "tokenizer": "Other",
        "instruct_type": null
      },
      "pricing": {
        "prompt": "0.000002",
        "completion": "0.000008",
        "image": "0",
        "request": "0"
      },
      "top_provider": {
        "context_length": 256000,
        "max_completion_tokens": 4096,
        "is_moderated": false
      },
      "per_request_limits": null
    },
    {
      "id": "jamba-1-5-mini",
      "name": "AI21: Jamba 1.5 Mini",
      "created": 1724371200,
      "description": "Jamba 1.5 Mini is the world's first production-grade Mamba-based model, combining SSM and Transformer architectures for a 256K context window and high efficiency.\n\nIt works with 9 languages and can handle various writing and analysis tasks as well as or better than similar small models.\n\nThis model uses less computer memory and works faster with longer texts than previous designs.\n\nRead their [announcement](https://www.ai21.com/blog/announcing-jamba-model-family) to learn more.",
      "context_length": 256000,
      "architecture": {
        "modality": "text->text",
        "tokenizer": "Other",
        "instruct_type": null
      },
      "pricing": {
        "prompt": "0.0000002",
        "completion": "0.0000004",
        "image": "0",
        "request": "0"
      },
      "top_provider": {
        "context_length": 256000,
        "max_completion_tokens": 4096,
        "is_moderated": false
      },
      "per_request_limits": null
    },
    {
      "id": "ai21/jamba-1-5-large",
      "name": "AI21: Jamba 1.5 Large",
      "created": 1724371200,
      "description": "Jamba 1.5 Large is part of AI21's new family of open models, offering superior speed, efficiency, and quality.\n\nIt features a 256K effective context window, the longest among open models, enabling improved performance on tasks like document summarization and analysis.\n\nBuilt on a novel SSM-Transformer architecture, it outperforms larger models like Llama 3.1 70B on benchmarks while maintaining resource efficiency.\n\nRead their [announcement](https://www.ai21.com/blog/announcing-jamba-model-family) to learn more.",
      "context_length": 256000,
      "architecture": {
        "modality": "text->text",
        "tokenizer": "Other",
        "instruct_type": null
      },
      "pricing": {
        "prompt": "0.000002",
        "completion": "0.000008",
        "image": "0",
        "request": "0"
      },
      "top_provider": {
        "context_length": 256000,
        "max_completion_tokens": 4096,
        "is_moderated": false
      },
      "per_request_limits": null
    },
    {
      "id": "ai21/jamba-1-5-mini",
      "name": "AI21: Jamba 1.5 Mini",
      "created": 1724371200,
      "description": "Jamba 1.5 Mini is the world's first production-grade Mamba-based model, combining SSM and Transformer architectures for a 256K context window and high efficiency.\n\nIt works with 9 languages and can handle various writing and analysis tasks as well as or better than similar small models.\n\nThis model uses less computer memory and works faster with longer texts than previous designs.\n\nRead their [announcement](https://www.ai21.com/blog/announcing-jamba-model-family) to learn more.",
      "context_length": 256000,
      "architecture": {
        "modality": "text->text",
        "tokenizer": "Other",
        "instruct_type": null
      },
      "pricing": {
        "prompt": "0.0000002",
        "completion": "0.0000004",
        "image": "0",
        "request": "0"
      },
      "top_provider": {
        "context_length": 256000,
        "max_completion_tokens": 4096,
        "is_moderated": false
      },
      "per_request_limits": null
    },
    {
      "id": "ai21_chat/jamba-1-5-large",
      "name": "AI21: Jamba 1.5 Large",
      "created": 1724371200,
      "description": "Jamba 1.5 Large is part of AI21's new family of open models, offering superior speed, efficiency, and quality.\n\nIt features a 256K effective context window, the longest among open models, enabling improved performance on tasks like document summarization and analysis.\n\nBuilt on a novel SSM-Transformer architecture, it outperforms larger models like Llama 3.1 70B on benchmarks while maintaining resource efficiency.\n\nRead their [announcement](https://www.ai21.com/blog/announcing-jamba-model-family) to learn more.",
      "context_length": 256000,
      "architecture": {
        "modality": "text->text",
        "tokenizer": "Other",
        "instruct_type": null
      },
      "pricing": {
        "prompt": "0.000002",
        "completion": "0.000008",
        "image": "0",
        "request": "0"
      },
      "top_provider": {
        "context_length": 256000,
        "max_completion_tokens": 4096,
        "is_moderated": false
      },
      "per_request_limits": null
    },
    {
      "id": "ai21_chat/jamba-1-5-mini",
      "name": "AI21: Jamba 1.5 Mini",
      "created": 1724371200,
      "description": "Jamba 1.5 Mini is the world's first production-grade Mamba-based model, combining SSM and Transformer architectures for a 256K context window and high efficiency.\n\nIt works with 9 languages and can handle various writing and analysis tasks as well as or better than similar small models.\n\nThis model uses less computer memory and works faster with longer texts than previous designs.\n\nRead their [announcement](https://www.ai21.com/blog/announcing-jamba-model-family) to learn more.",
      "context_length": 256000,
      "architecture": {
        "modality": "text->text",
        "tokenizer": "Other",
        "instruct_type": null
      },
      "pricing": {
        "prompt": "0.0000002",
        "completion": "0.0000004",
        "image": "0",
        "request": "0"
      },
      "top_provider": {
        "context_length": 256000,
        "max_completion_tokens": 4096,
        "is_moderated": false
      },
      "per_request_limits": null
    },
    {
      "id": "ai21_studio/jamba-1-5-large",
      "name": "AI21: Jamba 1.5 Large",
      "created": 1724371200,
      "description": "Jamba 1.5 Large is part of AI21's new family of open models, offering superior speed, efficiency, and quality.\n\nIt features a 256K effective context window, the longest among open models, enabling improved performance on tasks like document summarization and analysis.\n\nBuilt on a novel SSM-Transformer architecture, it outperforms larger models like Llama 3.1 70B on benchmarks while maintaining resource efficiency.\n\nRead their [announcement](https://www.ai21.com/blog/announcing-jamba-model-family) to learn more.",
      "context_length": 256000,
      "architecture": {
        "modality": "text->text",
        "tokenizer": "Other",
        "instruct_type": null
      },
      "pricing": {
        "prompt": "0.000002",
        "completion": "0.000008",
        "image": "0",
        "request": "0"
      },
      "top_provider": {
        "context_length": 256000,
        "max_completion_tokens": 4096,
        "is_moderated": false
      },
      "per_request_limits": null
    },
    {
      "id": "ai21_studio/jamba-1-5-mini",
      "name": "AI21: Jamba 1.5 Mini",
      "created": 1724371200,
      "description": "Jamba 1.5 Mini is the world's first production-grade Mamba-based model, combining SSM and Transformer architectures for a 256K context window and high efficiency.\n\nIt works with 9 languages and can handle various writing and analysis tasks as well as or better than similar small models.\n\nThis model uses less computer memory and works faster with longer texts than previous designs.\n\nRead their [announcement](https://www.ai21.com/blog/announcing-jamba-model-family) to learn more.",
      "context_length": 256000,
      "architecture": {
        "modality": "text->text",
        "tokenizer": "Other",
        "instruct_type": null
      },
      "pricing": {
        "prompt": "0.0000002",
        "completion": "0.0000004",
        "image": "0",
        "request": "0"
      },
      "top_provider": {
        "context_length": 256000,
        "max_completion_tokens": 4096,
        "is_moderated": false
      },
      "per_request_limits": null
    },    
hdnh2006 commented 1 day ago

Ok @bgeneto, I was becoming crazy with this issue.

The LiteLLM team is super pro, surely they will find a fast solution.