Closed bachya closed 2 weeks ago
A quick note: non-streaming calls (i.e., completion
calls) always return a complete response.
value of async chunk: {'text': '', 'tool_use': None, 'is_finished': False, 'finish_reason': 'content_filter', 'usage': {'prompt_tokens': 37, 'completion_tokens': 17, 'total_tokens': 54}, 'index': 0}
@bachya it looks like this request is stopped mid-stream by gemini due to a content_filter error.
it looks like our final response doesn't return the correct stop_reason
ModelResponse(id='chatcmpl-044ed017-961c-4581-bdbe-2cd504b01dc4', choices=[StreamingChoices(finish_reason='stop', index=0, delta=Delta(content=None, role=None, function_call=None, tool_calls=None), logprobs=None)], created=1728337451, model='gemini-1.5-pro', object='chat.completion.chunk', system_fingerprint=None)
What happened?
Using
litellm
1.48.18. This simple code:...will regularly return inconsistent results: sometimes (although irregularly), I'll get a complete response, but other times (more consistently), the stream will stop prematurely. See below for outputs.
Relevant log output
Example of truncated response:
Example of complete response:
Twitter / LinkedIn details
No response