[Bug]: Issue with Image URL Handling in Gemini Models for Vision Capabilities

mvrodrig commented 4 days ago

What happened?

Description:
The curl request specified in your documentation for Gemini models with vision capabilities does not work as expected. The request fails when using the gemini/gemini-1.5-pro model but works correctly for the same model through Vertex AI.

This issue is particularly important because I need to use the gemini-exp-1121 model, which is available exclusively through Gemini - Google AI Studio and not via Vertex AI.

The problem seems to be related to the image_url provided in the content field. Below are the details:

Example That Fails:

Request:

curl --request POST \
  --url http://localhost:4000/v1/chat/completions \
  --header 'Content-Type: application/json' \
  --data '{
  "model": "gemini/gemini-1.5-pro",
  "messages": [
    {
        "role": "user",
        "content": [
            {
                "type": "text",
                "text": "What's in this image?"
            },
            {
                "type": "image_url",
                "image_url": {"url": "https://i.pinimg.com/736x/b4/b1/be/b4b1becad04d03a9071db2817fc9fe77.jpg"}
            }
        ]
    }
],
    "temperature": 0.1
}
'

Response:

{
    "error": {
        "message": "litellm.BadRequestError: VertexAIException BadRequestError - {\n  \"error\": {\n    \"code\": 400,\n    \"message\": \"Invalid or unsupported file uri: https://i.pinimg.com/736x/b4/b1/be/b4b1becad04d03a9071db2817fc9fe77.jpg\",\n    \"status\": \"INVALID_ARGUMENT\"\n  }\n}\n\nReceived Model Group=gemini/gemini-1.5-pro\nAvailable Model Group Fallbacks=None",
        "type": null,
        "param": null,
        "code": "400"
    }
}

Example That Works (Using Vertex AI):

Request:

curl --request POST \
  --url http://localhost:4000/v1/chat/completions \
  --header 'Content-Type: application/json' \
  --data '{
  "model": "vertex_ai/gemini-1.5-pro",
  "messages": [
    {
        "role": "user",
        "content": [
            {
                "type": "text",
                "text": "What's in this image?"
            },
            {
                "type": "image_url",
                "image_url": {"url": "https://i.pinimg.com/736x/b4/b1/be/b4b1becad04d03a9071db2817fc9fe77.jpg"}
            }
        ]
    }
],
    "temperature": 0.1
}
'

Response:

{
    "id": "chatcmpl-ea379a41-ad03-4047-9aab-b8e4a4a6551f",
    "created": 1732540887,
    "model": "gemini-1.5-pro",
    "object": "chat.completion",
    "choices": [
        {
            "message": {
                "content": "The image shows three characters from the Disney movie \"The Lion King\"...",
                "role": "assistant"
            }
        }
    ]
}

Please advise on how to resolve this discrepancy or whether additional configurations are required to enable vision capabilities for the Gemini model.

Relevant log output

No response

Twitter / LinkedIn details

No response

krrishdholakia commented 4 days ago

Invalid or unsupported file uri: https://i.pinimg.com/736x/b4/b1/be/b4b1becad04d03a9071db2817fc9fe77.jpg

It looks like Vertex AI supports reading images as url's but Google AI Studio doesn't

krrishdholakia commented 4 days ago

able to repro

BerriAI / litellm