BerriAI / litellm

Python SDK, Proxy Server to call 100+ LLM APIs using the OpenAI format - [Bedrock, Azure, OpenAI, VertexAI, Cohere, Anthropic, Sagemaker, HuggingFace, Replicate, Groq]
https://docs.litellm.ai/docs/
Other
12.28k stars 1.43k forks source link

[Bug]: Finetuned Gemini-Pro Models dont work with provider vertex_ai #5678

Closed the-wdr closed 3 days ago

the-wdr commented 4 days ago

What happened?

Bug Report: Finetuned Gemini-Pro Models Do Not Work with Vertex AI Provider

[Bug]:

Finetuned Gemini-Pro models are not functioning correctly with the vertex_ai provider in our setup. When attempting to make a prediction using a finetuned Gemini-Pro model, the following error is returned:

{
    "error": {
        "message": "litellm.InternalServerError: VertexAIException InternalServerError - 400 Gemini cannot be accessed through Vertex Predict/RawPredict API. Please follow https://cloud.google.com/vertex-ai/docs/generative-ai/start/quickstarts/quickstart-multimodal for Gemini usage.\nReceived Model Group=seo-gemini-1-0\nAvailable Model Group Fallbacks=None",
        "type": null,
        "param": null,
        "code": "500"
    }
}

Steps to Reproduce:

  1. Use the following configuration in config.yaml:

    - model_name: gemini-pro
      litellm_params:
        model: vertex_ai/gemini-1.5-pro-001
        vertex_project: <PROJECT_ID>
        vertex_location: <LOCATION>
    - model_name: finetuned-gemini
      litellm_params:
        model: vertex_ai/<ENDPOINT_ID>
        vertex_project: <PROJECT_ID>
        vertex_location: <LOCATION>
      model_info:
        base_model: vertex_ai/gemini-pro
  2. Test non Finetuned Model The Gemini-Pro model should work with the Vertex AI Predict or RawPredict API as expected, returning predictions without error.

    curl --location 'https://127.0.0.1:4000/v1/chat/completions' \
    --header 'Content-Type: application/json' \
    --header 'Authorization: <LITELLM_KEY>' \
    --data '{"model": "gemini-pro" ,"messages":[{"role": "user", "content":[{"type": "text", "text": "hi"}]}]}'

    Result

    {
    "id": "chatcmpl-5d7c1424-d96e-4089-a319-ffa690ef4477",
    "choices": [
        {
            "finish_reason": "stop",
            "index": 0,
            "message": {
                "content": "Hi there! 👋  What can I do for you today? 😊 \n",
                "role": "assistant",
                "tool_calls": null,
                "function_call": null
            }
        }
    ],
    "created": 1726220014,
    "model": "gemini-1.5-pro-001",
    "object": "chat.completion",
    "system_fingerprint": null,
    "usage": {
        "completion_tokens": 16,
        "prompt_tokens": 1,
        "total_tokens": 17
    }
    }
  3. Attempt to run predictions using the finetuned-gemini model model via LiteLLM completion-api.

    curl --location 'https://127.0.0.1:4000/v1/chat/completions' \
    --header 'Content-Type: application/json' \
    --header 'Authorization: <LITELLM_KEY>' \
    --data '{"model": "finetuned_gemini" ,"messages":[{"role": "user", "content":[{"type": "text", "text": "hi"}]}]}'

Expected Behavior:

The finetuned Gemini-Pro model should work with the the vertex_ai Provider as expected, returning predictions without error.

Actual Behavior:

The API returns a 500 error with the following message:

{
    "error": {
        "message": "litellm.InternalServerError: VertexAIException InternalServerError - 400 Gemini cannot be accessed through Vertex Predict/RawPredict API. Please follow https://cloud.google.com/vertex-ai/docs/generative-ai/start/quickstarts/quickstart-multimodal for Gemini usage.\nReceived Model Group=seo-gemini-1-0\nAvailable Model Group Fallbacks=None",
        "type": null,
        "param": null,
        "code": "500"
    }
}

Environment Details:

Additional Notes:

References:


Please advise on how to resolve this issue, or clarify whether these models require a different approach for usage via Litellm-Proxy.

Relevant log output

No response

Twitter / LinkedIn details

No response

krrishdholakia commented 4 days ago

@the-wdr workaround for this is to use the pass-through endpoint - https://docs.litellm.ai/docs/pass_through/vertex_ai

krrishdholakia commented 4 days ago

Interesting:

- model_name: finetuned-gemini
  litellm_params:
    model: vertex_ai/<ENDPOINT_ID>
    vertex_project: <PROJECT_ID>
    vertex_location: <LOCATION>
  model_info:
    base_model: vertex_ai/gemini-pro

i see the use of base_model. I believe we can use that to route the call correctly (i believe it's the generateContent endpoint for gemini models)

ishaan-jaff commented 4 days ago

we already support fine tuned models on Vertex AI https://docs.litellm.ai/docs/providers/vertex#fine-tuned-models and send it to the correct endpoint

@the-wdr what version of litellm are you on ? Can you try the latest version of litellm ?

link to relevant test: https://github.com/BerriAI/litellm/blob/cd8d7ca9156a5fc2510db1ef0d43956d3239eccf/litellm/tests/test_amazing_vertex_completion.py#L2230

krrishdholakia commented 4 days ago

@ishaan-jaff we support finetuned models but i believe they're currently routed to vertex ai model garden's predict endpoint

here's what i'm looking at in code to confirm this - https://github.com/BerriAI/litellm/blob/cd8d7ca9156a5fc2510db1ef0d43956d3239eccf/litellm/main.py#L2126

ishaan-jaff commented 4 days ago

how would it enter that branch @krrishdholakia for a finetuned model ? a fine tuned model has model=vertex_ai/<ENDPOINT_ID> I don't see gemini in there

relevant pr adding vertex_ai finetuned support: https://github.com/BerriAI/litellm/pull/5371

ishaan-jaff commented 4 days ago

I suspect the issue is we expect vertex fine tuned models to be using vertex_ai_beta and it does not get routed correctly

ishaan-jaff commented 4 days ago

I'm able to repro locally when using vertex_ai instead of vertex_ai_beta. Working on a fix

=============================================================================================================== short test summary info ===============================================================================================================
FAILED test_amazing_vertex_completion.py::test_completion_fine_tuned_model - litellm.exceptions.InternalServerError: litellm.InternalServerError: VertexAIException InternalServerError - 400 Gemini cannot be accessed through Vertex Predict/RawPredict API. Please follow https://cloud.google.com/vertex-ai/docs/generative-ai/start/quickstarts/quickstart-multimodal for Gemini usage.
krrishdholakia commented 4 days ago

@ishaan-jaff already on it

krrishdholakia commented 4 days ago

i think we can just use the base_model given

ishaan-jaff commented 4 days ago

already on it

sounds good, ill let you fix it then

ishaan-jaff commented 1 day ago

Hi @the-wdr , curious do you use LiteLLM today ? If so, I'd love to hop on a call and learn how we can improve LiteLLM for you