fix latency issues on google ai studio

BerriAI / litellm

Python SDK, Proxy Server (LLM Gateway) to call 100+ LLM APIs in OpenAI format - [Bedrock, Azure, OpenAI, VertexAI, Cohere, Anthropic, Sagemaker, HuggingFace, Replicate, Groq]

Other

14.07k stars 1.66k forks source link

These fixes got google ai studio median latency to ~40-100ms

Pre PR latency was 900ms

Relevant issues

Type

🆕 New Feature 🐛 Bug Fix 🧹 Refactoring 📖 Documentation 🚄 Infrastructure ✅ Test

Changes

[REQUIRED] Testing - Attach a screenshot of any new tests passing locall

If UI changes, send a screenshot/GIF of working UI fixes

[!NOTE] I'm currently writing a description for your pull request. I should be done shortly (<1 minute). Please don't edit the description field until I'm finished, or we may overwrite each other. If I find nothing to write about, I'll delete this message.

Name	Status	Preview	Comments	Updated (UTC)
litellm	✅ Ready (Inspect)	Visit Preview	💬 Add feedback	Nov 21, 2024 4:35pm

Name

Status

Preview

Comments

Updated (UTC)

litellm

✅ Ready (Inspect)

Visit Preview

💬 Add feedback

Nov 21, 2024 4:35pm

BerriAI / litellm