Open JamDon2 opened 1 day ago
Here is an example. Here I repeated the same prompt to see whether it would be cut off the same way. The temperature is 0 for reproducibility, but the same happens with different values
There is nothing in the logs, and no errors.
Did the stream just end? Can you try sharing an example with --detailed_debug
enabled @JamDon2
iirc their stream sometimes changes and returns partial json's - https://github.com/BerriAI/litellm/blob/0d0f46a826c42f52db56bfdc4e0dbf6913652671/litellm/tests/test_streaming.py#L865
Perhaps this is related to that?
I'm currently looking through the logs, and I see this error sometimes:
ValueError: User doesn't exist in db. 'user_id'=admin. Create user via
/user/newcall.
It appears randomly, not when making a request, and the UI is not open.
This looks like the relevant part. So what this means, is that the Vertex AI endpoint returned "I", and then stopped the completion?
INFO: 172.18.0.1:41794 - "POST /v1/chat/completions HTTP/1.1" 200 OK
10:26:04 - LiteLLM Proxy:DEBUG: proxy_server.py:2579 - async_data_generator: received streaming chunk - ModelResponse(id='chatcmpl-ID_REDACTED', choices=[StreamingChoices(finish_reason=None, index=0, delta=Delta(content='I', role='assistant', function_call=None, tool_calls=None), logprobs=None)], created=1727605564, model='gemini-1.5-flash', object='chat.completion.chunk', system_fingerprint=None)
10:26:04 - LiteLLM Proxy:DEBUG: proxy_server.py:2579 - async_data_generator: received streaming chunk - ModelResponse(id='chatcmpl-ID_REDACTED', choices=[StreamingChoices(finish_reason='stop', index=0, delta=Delta(content=None, role=None, function_call=None, tool_calls=None), logprobs=None)], created=1727605564, model='gemini-1.5-flash', object='chat.completion.chunk', system_fingerprint=None)
Hmm none of this explains why a stream would stop. Can you email me (krrish@berri.ai) the complete logs or we can debug over a call? https://calendly.com/d/4mp-gd3-k5k/litellm-1-1-onboarding-chat
What happened?
When using LiteLLM Proxy with streaming often (around 20% of the time) the response gets cut off. The model was going to use a tool in that response, but it was cut off before that.
I am using Vertex AI with Gemini 1.5 Flash. There is nothing in the logs, and no errors.
Relevant log output
No response
Twitter / LinkedIn details
No response