[Feature]: Parameter based routing

The Feature

model_list:
  - model_name: gemini-1.5-pro-preview-0409
    litellm_params:
      model: vertex_ai/gemini-1.5-pro-preview-0409
      vertex_project: litellm-epic
      vertex_location: europe-west2
      disallowed_parameters: {"response_format": '{"type": "json_object"}', "n": ">1"}

  - model_name: gemini-1.5-pro-preview-0409
    litellm_params:
      model: gemini/gemini-1.5-pro-latest

For example, if this request comes in, route it to Vertex AI.

curl -v "${OPENAI_API_BASE}/chat/completions" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $OPENAI_API_KEY" \
  -d '{
    "model": "gemini-1.5-pro-preview-0409",
    "response_format": {"type": "text"},
    "max_tokens": 8192,
    "messages": [
      {
        "role": "user",
        "content": "tell me a joke in JSON"
      }
    ]
  }'

If this request comes in, route it to Gemini (AI Studio).

curl -v "${OPENAI_API_BASE}/chat/completions" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $OPENAI_API_KEY" \
  -d '{
    "model": "gemini-1.5-pro-preview-0409",
    "response_format": {"type": "json_object"},
    "max_tokens": 8192,
    "messages": [
      {
        "role": "user",
        "content": "tell me a joke in JSON"
      }
    ]
  }'

Motivation, pitch

Right now, Vertex AI (not LiteLLM) is pretty broken when using JSON mode with Gemini 1.5 Pro, it throws 500s on the majority of requests. It would be nice if I could use Gemini (AI Studio) instead of Vertex AI only for the requests that use response_format.

Twitter / LinkedIn details

https://twitter.com/DaveManouchehri

BerriAI / litellm

[Feature]: Parameter based routing #3364

The Feature

Motivation, pitch

Twitter / LinkedIn details