[Feature]: Support upstream batching on Vertex AI (for Google Gemini)

BerriAI / litellm

Python SDK, Proxy Server (LLM Gateway) to call 100+ LLM APIs in OpenAI format - [Bedrock, Azure, OpenAI, VertexAI, Cohere, Anthropic, Sagemaker, HuggingFace, Replicate, Groq]

https://docs.litellm.ai/docs/

Other

13.99k stars 1.66k forks source link

[Feature]: Support upstream batching on Vertex AI (for Google Gemini) #3734

Open Manouchehri opened 6 months ago

Manouchehri commented 6 months ago

The Feature

Basically the same as #3247, but for Vertex AI. Blocked by #3246 as well.

https://cloud.google.com/vertex-ai/generative-ai/docs/multimodal/batch-prediction-gemini

https://cloud.google.com/vertex-ai/generative-ai/docs/model-reference/batch-prediction-api

Motivation, pitch

Cost savings, better rate limits, etc.

Twitter / LinkedIn details

https://twitter.com/DaveManouchehri

krrishdholakia commented 6 months ago

is this high priority @Manouchehri

Manouchehri commented 6 months ago

If you do #3247, I should be able to do this. 🙂