BerriAI / litellm

Python SDK, Proxy Server (LLM Gateway) to call 100+ LLM APIs in OpenAI format - [Bedrock, Azure, OpenAI, VertexAI, Cohere, Anthropic, Sagemaker, HuggingFace, Replicate, Groq]
https://docs.litellm.ai/docs/
Other
13.46k stars 1.58k forks source link

[Feature]: Support inline media on multimodal Gemini models with the Gemini provider #2756

Closed cheahjs closed 4 months ago

cheahjs commented 7 months ago

The Feature

Support inline, base64-encoded media when using multimodal Gemini models (Gemini 1.0 Pro Vision, Gemini 1.5 Pro) on AI Studio via the Gemini provider.

There's support for this in the Vertex AI provider: https://github.com/BerriAI/litellm/blob/0ec9088001b0fc089a4141e0a51bd4cc63fd5d02/litellm/llms/vertex_ai.py#L145-L248

Motivation, pitch

I would like to use the vision models on AI Studio (primarily for cost because of the free tier), but at the moment attempting to use the OpenAI model of inlining media results in the API spitting back out {'message': '400 Add an image to use models/gemini-1.0-pro-vision-latest, or switch your model to a text model.', 'type': None, 'param': None, 'code': 500}.

Twitter / LinkedIn details

No response

krrishdholakia commented 7 months ago

Hey @cheahjs would welcome a PR on this. Here's the relevant file - https://github.com/BerriAI/litellm/blob/141396a2148fe5bef34045509acbc9ffa4e859ca/litellm/llms/gemini.py#L4

krrishdholakia commented 4 months ago

Hey @cheahjs this is now live in v1.40.16