[Bug]: Finetuned Gemini-Pro Models dont work with provider vertex_ai

the-wdr commented 4 days ago

What happened?

Bug Report: Finetuned Gemini-Pro Models Do Not Work with Vertex AI Provider

[Bug]:

Finetuned Gemini-Pro models are not functioning correctly with the vertex_ai provider in our setup. When attempting to make a prediction using a finetuned Gemini-Pro model, the following error is returned:

{
    "error": {
        "message": "litellm.InternalServerError: VertexAIException InternalServerError - 400 Gemini cannot be accessed through Vertex Predict/RawPredict API. Please follow https://cloud.google.com/vertex-ai/docs/generative-ai/start/quickstarts/quickstart-multimodal for Gemini usage.\nReceived Model Group=seo-gemini-1-0\nAvailable Model Group Fallbacks=None",
        "type": null,
        "param": null,
        "code": "500"
    }
}

Steps to Reproduce:

Use the following configuration in config.yaml:

- model_name: gemini-pro
  litellm_params:
    model: vertex_ai/gemini-1.5-pro-001
    vertex_project: <PROJECT_ID>
    vertex_location: <LOCATION>
- model_name: finetuned-gemini
  litellm_params:
    model: vertex_ai/<ENDPOINT_ID>
    vertex_project: <PROJECT_ID>
    vertex_location: <LOCATION>
  model_info:
    base_model: vertex_ai/gemini-pro

Test non Finetuned Model The Gemini-Pro model should work with the Vertex AI Predict or RawPredict API as expected, returning predictions without error.

curl --location 'https://127.0.0.1:4000/v1/chat/completions' \
--header 'Content-Type: application/json' \
--header 'Authorization: <LITELLM_KEY>' \
--data '{"model": "gemini-pro" ,"messages":[{"role": "user", "content":[{"type": "text", "text": "hi"}]}]}'

Result

{
"id": "chatcmpl-5d7c1424-d96e-4089-a319-ffa690ef4477",
"choices": [
    {
        "finish_reason": "stop",
        "index": 0,
        "message": {
            "content": "Hi there! 👋  What can I do for you today? 😊 \n",
            "role": "assistant",
            "tool_calls": null,
            "function_call": null
        }
    }
],
"created": 1726220014,
"model": "gemini-1.5-pro-001",
"object": "chat.completion",
"system_fingerprint": null,
"usage": {
    "completion_tokens": 16,
    "prompt_tokens": 1,
    "total_tokens": 17
}
}

Attempt to run predictions using the finetuned-gemini model model via LiteLLM completion-api.

curl --location 'https://127.0.0.1:4000/v1/chat/completions' \
--header 'Content-Type: application/json' \
--header 'Authorization: <LITELLM_KEY>' \
--data '{"model": "finetuned_gemini" ,"messages":[{"role": "user", "content":[{"type": "text", "text": "hi"}]}]}'

Expected Behavior:

The finetuned Gemini-Pro model should work with the the vertex_ai Provider as expected, returning predictions without error.

Actual Behavior:

The API returns a 500 error with the following message:

{
    "error": {
        "message": "litellm.InternalServerError: VertexAIException InternalServerError - 400 Gemini cannot be accessed through Vertex Predict/RawPredict API. Please follow https://cloud.google.com/vertex-ai/docs/generative-ai/start/quickstarts/quickstart-multimodal for Gemini usage.\nReceived Model Group=seo-gemini-1-0\nAvailable Model Group Fallbacks=None",
        "type": null,
        "param": null,
        "code": "500"
    }
}

Environment Details:

Model: `Finetuned Gemini-Pro Model over GCP Endpoint
Auth: Provided via ServiceAccount.json into Litellm Enviroment-Variable

Additional Notes:

The error seems to indicate that the Gemini models are accessed using the Predict/RawPredict API, Instead of the :generateContent Endpoint.
The error references using an alternative method for Gemini access through the quickstart link, suggesting that the models might need to be accessed differently or that this API is unsupported for these models.

References:

Quickstart Guide for Gemini Models in Vertex AI

Please advise on how to resolve this issue, or clarify whether these models require a different approach for usage via Litellm-Proxy.

Relevant log output

No response

Twitter / LinkedIn details

No response

krrishdholakia commented 4 days ago

@the-wdr workaround for this is to use the pass-through endpoint - https://docs.litellm.ai/docs/pass_through/vertex_ai

krrishdholakia commented 4 days ago

Interesting:

- model_name: finetuned-gemini
  litellm_params:
    model: vertex_ai/<ENDPOINT_ID>
    vertex_project: <PROJECT_ID>
    vertex_location: <LOCATION>
  model_info:
    base_model: vertex_ai/gemini-pro

i see the use of base_model. I believe we can use that to route the call correctly (i believe it's the generateContent endpoint for gemini models)

ishaan-jaff commented 4 days ago

we already support fine tuned models on Vertex AI https://docs.litellm.ai/docs/providers/vertex#fine-tuned-models and send it to the correct endpoint

@the-wdr what version of litellm are you on ? Can you try the latest version of litellm ?

link to relevant test: https://github.com/BerriAI/litellm/blob/cd8d7ca9156a5fc2510db1ef0d43956d3239eccf/litellm/tests/test_amazing_vertex_completion.py#L2230

krrishdholakia commented 4 days ago

@ishaan-jaff we support finetuned models but i believe they're currently routed to vertex ai model garden's predict endpoint

here's what i'm looking at in code to confirm this - https://github.com/BerriAI/litellm/blob/cd8d7ca9156a5fc2510db1ef0d43956d3239eccf/litellm/main.py#L2126

ishaan-jaff commented 4 days ago

how would it enter that branch @krrishdholakia for a finetuned model ? a fine tuned model has model=vertex_ai/<ENDPOINT_ID> I don't see gemini in there

relevant pr adding vertex_ai finetuned support: https://github.com/BerriAI/litellm/pull/5371

ishaan-jaff commented 4 days ago

I suspect the issue is we expect vertex fine tuned models to be using vertex_ai_beta and it does not get routed correctly

ishaan-jaff commented 4 days ago

I'm able to repro locally when using vertex_ai instead of vertex_ai_beta. Working on a fix

=============================================================================================================== short test summary info ===============================================================================================================
FAILED test_amazing_vertex_completion.py::test_completion_fine_tuned_model - litellm.exceptions.InternalServerError: litellm.InternalServerError: VertexAIException InternalServerError - 400 Gemini cannot be accessed through Vertex Predict/RawPredict API. Please follow https://cloud.google.com/vertex-ai/docs/generative-ai/start/quickstarts/quickstart-multimodal for Gemini usage.

krrishdholakia commented 4 days ago

@ishaan-jaff already on it

krrishdholakia commented 4 days ago

i think we can just use the base_model given

ishaan-jaff commented 4 days ago

already on it

sounds good, ill let you fix it then

ishaan-jaff commented 1 day ago

Hi @the-wdr , curious do you use LiteLLM today ? If so, I'd love to hop on a call and learn how we can improve LiteLLM for you

my cal for your convenience: https://calendly.com/d/4mp-gd3-k5k/litellm-1-1-onboarding-chat
my linkedin if you prefer DMs: https://www.linkedin.com/in/reffajnaahsi/

BerriAI / litellm