[Feature]: Overwrite `model` in responses returned to the client

Manouchehri commented 9 months ago

The Feature

Azure OpenAI only returns back the model family name (like gpt-4 instead of gpt-4-vision-preview), not the actual model name. Like #1810, overwrite what is returned for the model field.

e.g. for the response:

{
  "id": "chatcmpl-8pyPnQbFiaR2TJ0YioxgWqgRPgRS1",
  "choices": [
    {
      "finish_reason": {
        "type": "stop",
        "stop": "<|fim_suffix|>"
      },
      "index": 0,
      "logprobs": null,
      "message": {
        "content": "Why don't skeletons fight each other? They don't have the guts.",
        "role": "assistant",
        "function_call": null,
        "tool_calls": null
      }
    }
  ],
  "created": 1707397655,
  "model": "gpt-4",
  "object": "chat.completion",
  "system_fingerprint": null,
  "usage": {
    "completion_tokens": 15,
    "prompt_tokens": 13,
    "total_tokens": 28
  }
}

I'd like to get this returned instead:

{
  "id": "chatcmpl-8pyPnQbFiaR2TJ0YioxgWqgRPgRS1",
  "choices": [
    {
      "finish_reason": {
        "type": "stop",
        "stop": "<|fim_suffix|>"
      },
      "index": 0,
      "logprobs": null,
      "message": {
        "content": "Why don't skeletons fight each other? They don't have the guts.",
        "role": "assistant",
        "function_call": null,
        "tool_calls": null
      }
    }
  ],
  "created": 1707397655,
  "model": "gpt-4-vision-preview",
  "object": "chat.completion",
  "system_fingerprint": null,
  "usage": {
    "completion_tokens": 15,
    "prompt_tokens": 13,
    "total_tokens": 28
  }
}

This is my yaml config:

model_list:
  - model_name: gpt-4-vision-preview
    litellm_params:
      model: azure/gpt-4-vision-preview
      api_key: os.environ/AZURE_API_KEY_DEMO
      api_version: "2023-12-01-preview"
      api_base: "https://demo.openai.azure.com/"
      max_tokens: 4096
      base_model: azure/gpt-4-turbo-vision-preview

Motivation, pitch

OpenAI does this properly. Since LiteLLM provides a OpenAI API interface, I think it makes more sense to follow their "spec" instead of Azure.

{
  "id": "chatcmpl-8pyliNVqQLwO7c2dHOzIxeVORWKvh",
  "object": "chat.completion",
  "created": 1707399014,
  "model": "gpt-4-1106-vision-preview",
  "usage": {
    "prompt_tokens": 13,
    "completion_tokens": 15,
    "total_tokens": 28
  },
  "choices": [
    {
      "message": {
        "role": "assistant",
        "content": "Why don't skeletons fight each other? They don't have the guts."
      },
      "finish_reason": "stop",
      "index": 0
    }
  ]
}

Twitter / LinkedIn details

https://www.linkedin.com/in/davidmanouchehri/

Manouchehri commented 9 months ago

One minor note, I would like the returned model names to be like gpt-3.5-turbo-1106 and not gpt-35-turbo-1106 if possible.

Manouchehri commented 8 months ago

Bump on this? =)

Manouchehri commented 7 months ago

I tried doing this one myself, but got a bit lost on where is the proper place to overwrite model.

krrishdholakia commented 6 months ago

hey @Manouchehri seeing this late - how do you propose we get the actual model here?

Manouchehri commented 6 months ago

Pull it from model_name. e.g. for this:

  - model_name: gpt-3.5-turbo-0125
    litellm_params:
      model: azure/gpt-35-turbo
      api_version: "2024-05-01-preview"
      azure_ad_token: "oidc/google/https://example.com"
      api_base: "https://removed.openai.azure.com"
    model_info:
      base_model: azure/gpt-35-turbo-0125

Return gpt-3.5-turbo-0125 as the name.

krrishdholakia commented 6 months ago

oh - hmm, concerned b/c not everyone follows the same model_name convention.

How're you using this information?

Manouchehri commented 6 months ago

How about this?

  - model_name: gpt-3.5-turbo-0125
    litellm_params:
      model: azure/gpt-35-turbo
      api_version: "2024-05-01-preview"
      azure_ad_token: "oidc/google/https://example.com"
      api_base: "https://removed.openai.azure.com"
    model_info:
      base_model: azure/gpt-35-turbo-0125
      log_model_name: gpt-3.5-turbo-0125
      # model_name_log: gpt-3.5-turbo-0125

How're you using this information?

Langfuse. It's super annoying to see gpt-35-turbo, since that doesn't tell me anything about which version of GPT 3.5 Turbo is being used. I have to dig down into model_info-> base_model in langfuse, which isn't even possible to filter on in their web UI yet.

In debugging/curl commands themselves. e.g.:

curl -v "${OPENAI_API_BASE}/chat/completions" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $OPENAI_API_KEY" \
  -d '{
    "model": "gpt-3.5-turbo-0125",
    "messages": [
      {
        "role": "user",
        "content": "what is 1 plus 1?"
      }
    ]
  }' | jq

It drives me slightly insane to see this:

{
  "id": "chatcmpl-9VTjcM9H0lzk8RJbkFDIMQzSZQq1B",
  "choices": [
    {
      "finish_reason": "stop",
      "index": 0,
      "message": {
        "content": "1 plus 1 is equal to 2.",
        "role": "assistant"
      }
    }
  ],
  "created": 1717289496,
  "model": "gpt-35-turbo",
  "object": "chat.completion",
  "system_fingerprint": "fp_811936bd4f",
  "usage": {
    "completion_tokens": 10,
    "prompt_tokens": 15,
    "total_tokens": 25
  }
}

gpt-35-turbo is not the model I requested, so it's super weird to see it returned like that.

Another example:

curl -v "${OPENAI_API_BASE}/chat/completions" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $OPENAI_API_KEY" \
  -d '{
    "model": "claude-3-haiku-20240307",
    "messages": [
      {
        "role": "user",
        "content": "what is 1 plus 1?"
      }
    ]
  }'

Seeing anthropic.claude-3-haiku-20240307-v1:0 is really confusing again, because I (as the "end" user) requested claude-3-haiku-20240307.

{
  "id": "chatcmpl-92107778-85d0-4037-917b-44d918df0c45",
  "choices": [
    {
      "finish_reason": "stop",
      "index": 0,
      "message": {
        "content": "1 plus 1 equals 2.",
        "role": "assistant"
      }
    }
  ],
  "created": 1717289594,
  "model": "anthropic.claude-3-haiku-20240307-v1:0",
  "object": "chat.completion",
  "system_fingerprint": null,
  "usage": {
    "prompt_tokens": 16,
    "completion_tokens": 14,
    "total_tokens": 30
  },
  "finish_reason": "stop"
}

I get that the naming conventions are a weird thing that upstream LLMs like to be creative about.. but it'd be really good if LiteLLM could fix it. :)

krrishdholakia commented 6 months ago

I have to dig down into model_info-> base_model

got it - so if base_model is set, we can default to that so it's more precise

"model": "anthropic.claude-3-haiku-20240307-v1:0",

makes sense, if i'm calling a model group on litellm i should probably see a consistent name here, irrespective of the underlying provider

Manouchehri commented 6 months ago

got it - so if base_model is set, we can default to that so it's more precise

Not quite. In the case of gpt-35-turbo-0125, I would actually want gpt-3.5-turbo-0125 to be logged and returned. So it might make sense to add a new log_model_name or model_name_log field to allow overriding it. (Personally, I would be fine if you just take model_name; not sure if that breaks anything for other folks though.)

makes sense, if i'm calling a model group on litellm i should probably see a consistent name here, irrespective of the underlying provider

100%. =)

krrishdholakia commented 6 months ago

how do you use the model name on langfuse? @Manouchehri

Manouchehri commented 6 months ago

For filtering like this.

Manouchehri commented 6 months ago

It's really confusing/annoying that I have two different gpt-4o-2024-05-13 models, see screenshot below.

Manouchehri commented 6 months ago

Example for GPT 3.5 0125 confusion:

These should be both returned and logged as gpt-3.5-turbo-0125. Actually, some of them were gpt-3.5-turbo-1106, but I have no way of telling them apart retroactively easily.

Manouchehri commented 2 months ago

Bump on this?

BerriAI / litellm