BerriAI / litellm

Python SDK, Proxy Server to call 100+ LLM APIs using the OpenAI format - [Bedrock, Azure, OpenAI, VertexAI, Cohere, Anthropic, Sagemaker, HuggingFace, Replicate, Groq]
https://docs.litellm.ai/docs/
Other
12.2k stars 1.42k forks source link

[Bug]: groq models do not support streaming when in JSON mode #4804

Open ericmjl opened 1 month ago

ericmjl commented 1 month ago

What happened?

It appears that with LiteLLM version 1.35.38 (I have not upgraded to the latest b/c of other issues with Ollama JSON mode), I am unable to use groq models with JSON mode with streaming. I have a minimal notebook that reproduces this issue on GitHub gist: https://gist.github.com/ericmjl/6f3e2cbbfcf26a8f3334a58af6a76f63

Relevant log output

You can find the notebook here: https://gist.github.com/ericmjl/6f3e2cbbfcf26a8f3334a58af6a76f63

Twitter / LinkedIn details

@ericmjl

ishaan-jaff commented 1 month ago

On the latest version I get this error @ericmjl - would you expect litellm to fake the streaming response ?

 GroqException - Error code: 400 - {'error': {'message': 'response_format` does not support streaming', 'type': 'invalid_request_error'}}
ericmjl commented 1 month ago

@ishaan-jaff thinking about the problem from your perspective as a library maintainer, faking the streaming response might be good for the LiteLLM user experience but it'd also be adding a special case for you all to handle. I would love to see the streaming response faked (Groq is fast enough that for all practical purposes, just waiting for groq to return the full text is almost as good as seeing the streaming response), though I am cognizant of the extra burden it might put on you guys.