BerriAI / litellm

Call all LLM APIs using the OpenAI format. Use Bedrock, Azure, OpenAI, Cohere, Anthropic, Ollama, Sagemaker, HuggingFace, Replicate (100+ LLMs)
https://docs.litellm.ai/docs/
Other
10.27k stars 1.15k forks source link

[Feature]: Parameter based routing #3364

Open Manouchehri opened 2 months ago

Manouchehri commented 2 months ago

The Feature

model_list:
  - model_name: gemini-1.5-pro-preview-0409
    litellm_params:
      model: vertex_ai/gemini-1.5-pro-preview-0409
      vertex_project: litellm-epic
      vertex_location: europe-west2
      disallowed_parameters: {"response_format": '{"type": "json_object"}', "n": ">1"}

  - model_name: gemini-1.5-pro-preview-0409
    litellm_params:
      model: gemini/gemini-1.5-pro-latest

For example, if this request comes in, route it to Vertex AI.

curl -v "${OPENAI_API_BASE}/chat/completions" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $OPENAI_API_KEY" \
  -d '{
    "model": "gemini-1.5-pro-preview-0409",
    "response_format": {"type": "text"},
    "max_tokens": 8192,
    "messages": [
      {
        "role": "user",
        "content": "tell me a joke in JSON"
      }
    ]
  }'

If this request comes in, route it to Gemini (AI Studio).

curl -v "${OPENAI_API_BASE}/chat/completions" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $OPENAI_API_KEY" \
  -d '{
    "model": "gemini-1.5-pro-preview-0409",
    "response_format": {"type": "json_object"},
    "max_tokens": 8192,
    "messages": [
      {
        "role": "user",
        "content": "tell me a joke in JSON"
      }
    ]
  }'

Motivation, pitch

Right now, Vertex AI (not LiteLLM) is pretty broken when using JSON mode with Gemini 1.5 Pro, it throws 500s on the majority of requests. It would be nice if I could use Gemini (AI Studio) instead of Vertex AI only for the requests that use response_format.

Twitter / LinkedIn details

https://twitter.com/DaveManouchehri

krrishdholakia commented 2 months ago

that's interesting - why not just have it be a pre-call check, to filter out the deployments which violate the conditions? this way it would work across all routing strategies

https://github.com/BerriAI/litellm/blob/0b0be700fc05bf37c8cb1b4d37e7b19f8578e0c9/litellm/router.py#L2713

We do this today for context window checks - https://docs.litellm.ai/docs/routing#pre-call-checks-context-window