BerriAI / litellm

Call all LLM APIs using the OpenAI format. Use Bedrock, Azure, OpenAI, Cohere, Anthropic, Ollama, Sagemaker, HuggingFace, Replicate, Groq (100+ LLMs)
https://docs.litellm.ai/docs/
Other
10.78k stars 1.23k forks source link

[Feature]: handle Mistrals incorrect streaming format #1371

Closed ishaan-jaff closed 4 months ago

ishaan-jaff commented 6 months ago

What happened?

litellm --model mistral/mistral-medium --drop_params

litellm --test

Relevant log output

LiteLLM: Making a test ChatCompletions + streaming request to proxy. Model=gpt-3.5-turbo
LiteLLM: streaming response from proxy ChatCompletionChunk(id='cmpl-04a22dde86c9405b82032666b4fb02d0', choices=[Choice(delta=ChoiceDelta(content='In the', function_call=None, role='assistant', tool_calls=None), finish_reason=None, index=0)], created=1704731505, model='mistral-medium', object='chat.completion.chunk', system_fingerprint=None, usage={})
LiteLLM: streaming response from proxy ChatCompletionChunk(id='cmpl-04a22dde86c9405b82032666b4fb02d0', choices=[Choice(delta=ChoiceDelta(content=' quiet of the night,', function_call=None, role=None, tool_calls=None), finish_reason=None, index=0)], created=1704731505, model='mistral-medium', object='chat.completion.chunk', system_fingerprint=None, usage={})
LiteLLM: streaming response from proxy ChatCompletionChunk(id='cmpl-04a22dde86c9405b82032666b4fb02d0', choices=[Choice(delta=ChoiceDelta(content=" under the moon'", function_call=None, role=None, tool_calls=None), finish_reason=None, index=0)], created=1704731505, model='mistral-medium', object='chat.completion.chunk', system_fingerprint=None, usage={})
LiteLLM: streaming response from proxy ChatCompletionChunk(id='cmpl-04a22dde86c9405b82032666b4fb02d0', choices=[Choice(delta=ChoiceDelta(content='s gentle light,\n', function_call=None, role=None, tool_calls=None), finish_reason=None, index=0)], created=1704731505, model='mistral-medium', object='chat.completion.chunk', system_fingerprint=None, usage={})
LiteLLM: streaming response from proxy ChatCompletionChunk(id='cmpl-04a22dde86c9405b82032666b4fb02d0', choices=[Choice(delta=ChoiceDelta(content='A whispering bree', function_call=None, role=None, tool_calls=None), finish_reason=None, index=0)], created=1704731506, model='mistral-medium', object='chat.completion.chunk', system_fingerprint=None, usage={})
LiteLLM: streaming response from proxy ChatCompletionChunk(id='cmpl-04a22dde86c9405b82032666b4fb02d0', choices=[Choice(delta=ChoiceDelta(content='ze stirs the', function_call=None, role=None, tool_calls=None), finish_reason=None, index=0)], created=1704731506, model='mistral-medium', object='chat.completion.chunk', system_fingerprint=None, usage={})
LiteLLM: streaming response from proxy ChatCompletionChunk(id='cmpl-04a22dde86c9405b82032666b4fb02d0', choices=[Choice(delta=ChoiceDelta(content=' leaves', function_call=None, role=None, tool_calls=None), finish_reason=None, index=0)], created=1704731506, model='mistral-medium', object='chat.completion.chunk', system_fingerprint=None, usage={})
LiteLLM: streaming response from proxy ChatCompletionChunk(id='cmpl-04a22dde86c9405b82032666b4fb02d0', choices=[Choice(delta=ChoiceDelta(content=' with delight.\n', function_call=None, role=None, tool_calls=None), finish_reason=None, index=0)], created=1704731506, model='mistral-medium', object='chat.completion.chunk', system_fingerprint=None, usage={})
LiteLLM: streaming response from proxy ChatCompletionChunk(id='cmpl-04a22dde86c9405b82032666b4fb02d0', choices=[Choice(delta=ChoiceDelta(content='The stars twinkle', function_call=None, role=None, tool_calls=None), finish_reason=None, index=0)], created=1704731506, model='mistral-medium', object='chat.completion.chunk', system_fingerprint=None, usage={})
LiteLLM: streaming response from proxy ChatCompletionChunk(id='cmpl-04a22dde86c9405b82032666b4fb02d0', choices=[Choice(delta=ChoiceDelta(content=' bright, casting a', function_call=None, role=None, tool_calls=None), finish_reason=None, index=0)], created=1704731506, model='mistral-medium', object='chat.completion.chunk', system_fingerprint=None, usage={})
LiteLLM: streaming response from proxy ChatCompletionChunk(id='cmpl-04a22dde86c9405b82032666b4fb02d0', choices=[Choice(delta=ChoiceDelta(content=' silver light,\nIn', function_call=None, role=None, tool_calls=None), finish_reason=None, index=0)], created=1704731506, model='mistral-medium', object='chat.completion.chunk', system_fingerprint=None, usage={})
LiteLLM: streaming response from proxy ChatCompletionChunk(id='cmpl-04a22dde86c9405b82032666b4fb02d0', choices=[Choice(delta=ChoiceDelta(content=' this peaceful moment,', function_call=None, role=None, tool_calls=None), finish_reason=None, index=0)], created=1704731506, model='mistral-medium', object='chat.completion.chunk', system_fingerprint=None, usage={})
LiteLLM: streaming response from proxy ChatCompletionChunk(id='cmpl-04a22dde86c9405b82032666b4fb02d0', choices=[Choice(delta=ChoiceDelta(content=' all feels just right', function_call=None, role=None, tool_calls=None), finish_reason=None, index=0)], created=1704731506, model='mistral-medium', object='chat.completion.chunk', system_fingerprint=None, usage={})
LiteLLM: streaming response from proxy ChatCompletionChunk(id='cmpl-04a22dde86c9405b82032666b4fb02d0', choices=[Choice(delta=ChoiceDelta(content='.\n\nSo take', function_call=None, role=None, tool_calls=None), finish_reason=None, index=0)], created=1704731506, model='mistral-medium', object='chat.completion.chunk', system_fingerprint=None, usage={})
LiteLLM: streaming response from proxy ChatCompletionChunk(id='cmpl-04a22dde86c9405b82032666b4fb02d0', choices=[Choice(delta=ChoiceDelta(content=' a deep breath, and', function_call=None, role=None, tool_calls=None), finish_reason=None, index=0)], created=1704731507, model='mistral-medium', object='chat.completion.chunk', system_fingerprint=None, usage={})
LiteLLM: streaming response from proxy ChatCompletionChunk(id='cmpl-04a22dde86c9405b82032666b4fb02d0', choices=[Choice(delta=ChoiceDelta(content=' let go of your', function_call=None, role=None, tool_calls=None), finish_reason=None, index=0)], created=1704731507, model='mistral-medium', object='chat.completion.chunk', system_fingerprint=None, usage={})
LiteLLM: streaming response from proxy ChatCompletionChunk(id='cmpl-04a22dde86c9405b82032666b4fb02d0', choices=[Choice(delta=ChoiceDelta(content=' sight,\nClose your', function_call=None, role=None, tool_calls=None), finish_reason=None, index=0)], created=1704731507, model='mistral-medium', object='chat.completion.chunk', system_fingerprint=None, usage={})
LiteLLM: streaming response from proxy ChatCompletionChunk(id='cmpl-04a22dde86c9405b82032666b4fb02d0', choices=[Choice(delta=ChoiceDelta(content=' eyes and listen, to', function_call=None, role=None, tool_calls=None), finish_reason=None, index=0)], created=1704731507, model='mistral-medium', object='chat.completion.chunk', system_fingerprint=None, usage={})
LiteLLM: streaming response from proxy ChatCompletionChunk(id='cmpl-04a22dde86c9405b82032666b4fb02d0', choices=[Choice(delta=ChoiceDelta(content=' the stillness of the', function_call=None, role=None, tool_calls=None), finish_reason=None, index=0)], created=1704731507, model='mistral-medium', object='chat.completion.chunk', system_fingerprint=None, usage={})
LiteLLM: streaming response from proxy ChatCompletionChunk(id='cmpl-04a22dde86c9405b82032666b4fb02d0', choices=[Choice(delta=ChoiceDelta(content=' night.\nLet', function_call=None, role=None, tool_calls=None), finish_reason=None, index=0)], created=1704731507, model='mistral-medium', object='chat.completion.chunk', system_fingerprint=None, usage={})
LiteLLM: streaming response from proxy ChatCompletionChunk(id='cmpl-04a22dde86c9405b82032666b4fb02d0', choices=[Choice(delta=ChoiceDelta(content=' your worries drift', function_call=None, role=None, tool_calls=None), finish_reason=None, index=0)], created=1704731507, model='mistral-medium', object='chat.completion.chunk', system_fingerprint=None, usage={})
LiteLLM: streaming response from proxy ChatCompletionChunk(id='cmpl-04a22dde86c9405b82032666b4fb02d0', choices=[Choice(delta=ChoiceDelta(content=' away, like a fe', function_call=None, role=None, tool_calls=None), finish_reason=None, index=0)], created=1704731507, model='mistral-medium', object='chat.completion.chunk', system_fingerprint=None, usage={})
LiteLLM: streaming response from proxy ChatCompletionChunk(id='cmpl-04a22dde86c9405b82032666b4fb02d0', choices=[Choice(delta=ChoiceDelta(content='ather in flight,', function_call=None, role=None, tool_calls=None), finish_reason=None, index=0)], created=1704731507, model='mistral-medium', object='chat.completion.chunk', system_fingerprint=None, usage={})
LiteLLM: streaming response from proxy ChatCompletionChunk(id='cmpl-04a22dde86c9405b82032666b4fb02d0', choices=[Choice(delta=ChoiceDelta(content='\nAnd let the night', function_call=None, role=None, tool_calls=None), finish_reason=None, index=0)], created=1704731507, model='mistral-medium', object='chat.completion.chunk', system_fingerprint=None, usage={})
LiteLLM: streaming response from proxy ChatCompletionChunk(id='cmpl-04a22dde86c9405b82032666b4fb02d0', choices=[Choice(delta=ChoiceDelta(content="'s calmness,", function_call=None, role=None, tool_calls=None), finish_reason=None, index=0)], created=1704731508, model='mistral-medium', object='chat.completion.chunk', system_fingerprint=None, usage={})
LiteLLM: streaming response from proxy ChatCompletionChunk(id='cmpl-04a22dde86c9405b82032666b4fb02d0', choices=[Choice(delta=ChoiceDelta(content=' bring you pure delight', function_call=None, role=None, tool_calls=None), finish_reason=None, index=0)], created=1704731508, model='mistral-medium', object='chat.completion.chunk', system_fingerprint=None, usage={})
LiteLLM: streaming response from proxy ChatCompletionChunk(id='cmpl-04a22dde86c9405b82032666b4fb02d0', choices=[Choice(delta=ChoiceDelta(content='.', function_call=None, role=None, tool_calls=None), finish_reason=None, index=0)], created=1704731508, model='mistral-medium', object='chat.completion.chunk', system_fingerprint=None, usage={})
LiteLLM: streaming response from proxy ChatCompletionChunk(id='chatcmpl-1c0b378c-7a41-49c3-9090-222dcd47afcd', choices=[Choice(delta=ChoiceDelta(content=None, function_call=None, role=None, tool_calls=None), finish_reason='stop', index=0)], created=1704731508, model='mistral-medium', object='chat.completion.chunk', system_fingerprint=None, usage={})

 making completion request to proxy
Traceback (most recent call last):
  File "/Users/jakekoenig/mentat/.venv/bin/litellm", line 8, in <module>
    sys.exit(run_server())
             ^^^^^^^^^^^^
  File "/Users/jakekoenig/mentat/.venv/lib/python3.11/site-packages/click/core.py", line 1157, in __call__
    return self.main(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/jakekoenig/mentat/.venv/lib/python3.11/site-packages/click/core.py", line 1078, in main
    rv = self.invoke(ctx)
         ^^^^^^^^^^^^^^^^
  File "/Users/jakekoenig/mentat/.venv/lib/python3.11/site-packages/click/core.py", line 1434, in invoke
    return ctx.invoke(self.callback, **ctx.params)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/jakekoenig/mentat/.venv/lib/python3.11/site-packages/click/core.py", line 783, in invoke
    return __callback(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/jakekoenig/mentat/.venv/lib/python3.11/site-packages/litellm/proxy/proxy_cli.py", line 365, in run_server
    response = client.completions.create(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/jakekoenig/mentat/.venv/lib/python3.11/site-packages/openai/_utils/_utils.py", line 299, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/Users/jakekoenig/mentat/.venv/lib/python3.11/site-packages/openai/resources/completions.py", line 559, in create
    return self._post(
           ^^^^^^^^^^^
  File "/Users/jakekoenig/mentat/.venv/lib/python3.11/site-packages/openai/_base_client.py", line 1055, in post
    return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/jakekoenig/mentat/.venv/lib/python3.11/site-packages/openai/_base_client.py", line 834, in request
    return self._request(
           ^^^^^^^^^^^^^^
  File "/Users/jakekoenig/mentat/.venv/lib/python3.11/site-packages/openai/_base_client.py", line 877, in _request
    raise self._make_status_error_from_response(err.response) from None
openai.BadRequestError: Error code: 400 - {'detail': 'OpenAIException - Error code: 400 - {\'object\': \'error\', \'message\': \'Expected last role to be user but got RoleEnum.system\', \'type\': \'internal_error_proxy\', \'param\': None, \'code\': \'1000\'}\n\nTraceback (most recent call last):\n  File "/Users/jakekoenig/mentat/.venv/lib/python3.11/site-packages/litellm/main.py", line 2510, in atext_completion\n    response = await response\n               ^^^^^^^^^^^^^^\n  File "/Users/jakekoenig/mentat/.venv/lib/python3.11/site-packages/litellm/llms/openai.py", line 402, in acompletion\n    raise e\n  File "/Users/jakekoenig/mentat/.venv/lib/python3.11/site-packages/litellm/llms/openai.py", line 387, in acompletion\n    response = await openai_aclient.chat.completions.create(\n               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n  File "/Users/jakekoenig/mentat/.venv/lib/python3.11/site-packages/openai/resources/chat/completions.py", line 1199, in create\n    return await self._post(\n           ^^^^^^^^^^^^^^^^^\n  File "/Users/jakekoenig/mentat/.venv/lib/python3.11/site-packages/openai/_base_client.py", line 1474, in post\n    return await self.request(cast_to, opts, stream=stream, stream_cls=stream_cls)\n           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n  File "/Users/jakekoenig/mentat/.venv/lib/python3.11/site-packages/openai/_base_client.py", line 1275, in request\n    return await self._request(\n           ^^^^^^^^^^^^^^^^^^^^\n  File "/Users/jakekoenig/mentat/.venv/lib/python3.11/site-packages/openai/_base_client.py", line 1318, in _request\n    raise self._make_status_error_from_response(err.response) from None\nopenai.BadRequestError: Error code: 400 - {\'object\': \'error\', \'message\': \'Expected last role to be user but got RoleEnum.system\', \'type\': \'internal_error_proxy\', \'param\': None, \'code\': \'1000\'}\n\nDuring handling of the above exception, another exception occurred:\n\nTraceback (most recent call last):\n  File "/Users/jakekoenig/mentat/.venv/lib/python3.11/site-packages/litellm/proxy/proxy_server.py", line 1267, in completion\n    response = await litellm.atext_completion(**data)\n               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n  File "/Users/jakekoenig/mentat/.venv/lib/python3.11/site-packages/litellm/utils.py", line 2363, in wrapper_async\n    raise e\n  File "/Users/jakekoenig/mentat/.venv/lib/python3.11/site-packages/litellm/utils.py", line 2255, in wrapper_async\n    result = await original_function(*args, **kwargs)\n             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n  File "/Users/jakekoenig/mentat/.venv/lib/python3.11/site-packages/litellm/main.py", line 2528, in atext_completion\n    raise exception_type(\n          ^^^^^^^^^^^^^^^\n  File "/Users/jakekoenig/mentat/.venv/lib/python3.11/site-packages/litellm/utils.py", line 6594, in exception_type\n    raise e\n  File "/Users/jakekoenig/mentat/.venv/lib/python3.11/site-packages/litellm/utils.py", line 5616, in exception_type\n    raise APIError(\nlitellm.exceptions.APIError: OpenAIException - Error code: 400 - {\'object\': \'error\', \'message\': \'Expected last role to be user but got RoleEnum.system\', \'type\': \'internal_error_proxy\', \'param\': None, \'code\': \'1000\'}\n'}


### Twitter / LinkedIn details

_No response_
krrishdholakia commented 4 months ago

Should be live in the next release