[Bug]: InvokeModelWithResponseStream error when using Bedrock + Cohere + LibreChat

Manouchehri commented 11 months ago

What happened?

https://github.com/danny-avila/LibreChat/blob/822914d521dd3b3e09105bfd08d4a99b778e0303/api/app/clients/OpenAIClient.js#L391

Relevant log output

Traceback (most recent call last):
  File "/app/litellm/utils.py", line 1404, in wrapper_async
    result = await original_function(*args, **kwargs)
  File "/app/litellm/main.py", line 195, in acompletion
    raise exception_type(
  File "/app/litellm/utils.py", line 4565, in exception_type
    raise e
  File "/app/litellm/utils.py", line 3892, in exception_type
litellm.exceptions.BadRequestError: BedrockException - BedrockException - Traceback (most recent call last):
  File "/app/litellm/llms/bedrock.py", line 410, in completion
    response = client.invoke_model_with_response_stream(
  File "/usr/local/lib/python3.9/site-packages/botocore/client.py", line 535, in _api_call
    return self._make_api_call(operation_name, kwargs)
  File "/usr/local/lib/python3.9/site-packages/botocore/client.py", line 983, in _make_api_call
    raise error_class(parsed_response, operation_name)
botocore.errorfactory.ValidationException: An error occurred (ValidationException) when calling the InvokeModelWithResponseStream operation: Malformed input request: 2 schema violations found, please reformat your input and try again.

Twitter / LinkedIn details

https://www.linkedin.com/in/davidmanouchehri/

krrishdholakia commented 11 months ago

Hey @Manouchehri we had a bedrock/cohere error a couple days ago that got fixed.

Are you on the latest version?

If so, can you share the input being sent? via litellm --debug?

Manouchehri commented 11 months ago

Yep, I'm on the latest version (094144de58de7604deb47c525bd3ae72576560d4).

Will do, one moment.

Manouchehri commented 11 months ago

Is this what you were looking for?

Request Sent from LiteLLM:

            response = client.invoke_model_with_response_stream(
                body={"prompt": "tell me a joke.", "temperature": 1, "p": 1, "frequency_penalty": 0, "presence_penalty": 0, "stop_sequences": ["||>", "\nUser:", "<|diff_marker|>"], "stream": true},
                modelId=cohere.command-text-v14,
                accept=accept,
                contentType=contentType
            )

krrishdholakia commented 11 months ago

pretty much - thanks!

krrishdholakia commented 11 months ago

i'll work on repro'ing + pushing out a fix today. @Manouchehri

krrishdholakia commented 11 months ago

able to repro this problem - working on a fix.

krrishdholakia commented 11 months ago

fix pushed - https://github.com/BerriAI/litellm/commit/4e9aa0d3381fd7a180b4b442c416d3d2cd6fea88

@Manouchehri let me know if the issue persists if you set

litellm_settings:
  drop_params: True

Manouchehri commented 11 months ago

Now both Cohere and Claude don't work via Bedrock. 😓

INFO:     143.198.0.207:0 - "POST /v1/chat/completions HTTP/1.1" 500 Internal Server Error
receiving data: {'model': 'claude-v2', 'temperature': 1, 'top_p': 1, 'presence_penalty': 0, 'frequency_penalty': 0, 'stop': ['||>', '\nUser:', '<|diff_marker|>'], 'user': 'removed', 'stream': True, 'messages': [{'role': 'user', 'content': 'hey'}]}
LiteLLM.Router: Inside async function with retries: args - (); kwargs - {'temperature': 1, 'top_p': 1, 'presence_penalty': 0, 'frequency_penalty': 0, 'stop': ['||>', '\nUser:', '<|diff_marker|>'], 'user': 'removed', 'stream': True, 'metadata': {'user_api_key': 'removed', 'model_group': 'claude-v2'}, 'request_timeout': 600, 'model': 'claude-v2', 'messages': [{'role': 'user', 'content': 'hey'}], 'original_function': <bound method Router._acompletion of <litellm.router.Router object at 0x1093d1250>>, 'num_retries': 3}
LiteLLM.Router: async function w/ retries: original_function - <bound method Router._acompletion of <litellm.router.Router object at 0x1093d1250>>
LiteLLM.Router: Inside _acompletion()- model: claude-v2; kwargs: {'temperature': 1, 'top_p': 1, 'presence_penalty': 0, 'frequency_penalty': 0, 'stop': ['||>', '\nUser:', '<|diff_marker|>'], 'user': 'removed', 'stream': True, 'metadata': {'user_api_key': 'removed', 'model_group': 'claude-v2'}, 'request_timeout': 600}
LiteLLM.Router: initial list of deployments: [{'model_name': 'claude-v2', 'litellm_params': {'model': 'bedrock/anthropic.claude-v2-ModelID-bedrock/anthropic.claude-v2'}}]
get cache: cache key: 20-49:cooldown_models
get cache: cache result: ['bedrock/anthropic.claude-v2-ModelID-bedrock/anthropic.claude-v2']
LiteLLM.Router: retrieve cooldown models: ['bedrock/anthropic.claude-v2-ModelID-bedrock/anthropic.claude-v2']
LiteLLM.Router: cooldown deployments: ['bedrock/anthropic.claude-v2-ModelID-bedrock/anthropic.claude-v2']
LiteLLM.Router: healthy deployments: length 0 []
LiteLLM.Router: An exception occurs
LiteLLM.Router: Trying to fallback b/w models
An error occurred: No models available

 Debug this by setting `--debug`, e.g. `litellm --model gpt-3.5-turbo --debug`
Results from router

Router stats

Total Calls made
bedrock/anthropic.claude-v2-ModelID-bedrock/anthropic.claude-v2: 1

Success Calls made

Fail Calls made
bedrock/anthropic.claude-v2-ModelID-bedrock/anthropic.claude-v2: 1
Traceback (most recent call last):
  File "/Users/dave/Work/litellm/litellm/proxy/proxy_server.py", line 733, in chat_completion
    response = await llm_router.acompletion(**data)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/dave/Work/litellm/litellm/router.py", line 211, in acompletion
    raise e
  File "/Users/dave/Work/litellm/litellm/router.py", line 207, in acompletion
    response = await asyncio.wait_for(self.async_function_with_fallbacks(**kwargs), timeout=timeout)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/Cellar/python@3.11/3.11.6_1/Frameworks/Python.framework/Versions/3.11/lib/python3.11/asyncio/tasks.py", line 489, in wait_for
    return fut.result()
           ^^^^^^^^^^^^
  File "/Users/dave/Work/litellm/litellm/router.py", line 391, in async_function_with_fallbacks
    raise original_exception
  File "/Users/dave/Work/litellm/litellm/router.py", line 343, in async_function_with_fallbacks
    response = await self.async_function_with_retries(*args, **kwargs)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/dave/Work/litellm/litellm/router.py", line 420, in async_function_with_retries
    raise original_exception
  File "/Users/dave/Work/litellm/litellm/router.py", line 403, in async_function_with_retries
    response = await original_function(*args, **kwargs)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/dave/Work/litellm/litellm/router.py", line 244, in _acompletion
    raise e
  File "/Users/dave/Work/litellm/litellm/router.py", line 221, in _acompletion
    deployment = self.get_available_deployment(model=model, messages=messages)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/dave/Work/litellm/litellm/router.py", line 903, in get_available_deployment
    raise ValueError("No models available")
ValueError: No models available
INFO:     143.198.0.207:0 - "POST /v1/chat/completions HTTP/1.1" 500 Internal Server Error

krrishdholakia commented 11 months ago

@Manouchehri can you test it with this and let me know if it's resolved. this worked for me locally - https://github.com/BerriAI/litellm/commit/383dd53e86b800d68091364988174664711357da

Would recommend waiting till commits have passed our ci/cd pipeline before using them in prod (we run 100+ tests, so issues like these do get caught before a release is made).

Manouchehri commented 11 months ago

Using 5fc7cd28d988e7d50649fee88e0281ee95603074, Claude works again, but Cohere Command via Bedrock still doesn't work.

An error occurred: No models available

 Debug this by setting `--debug`, e.g. `litellm --model gpt-3.5-turbo --debug`
Results from router

Router stats

Total Calls made
bedrock/cohere.command-text-v14-ModelID-bedrock/cohere.command-text-v14: 2
bedrock/meta.llama2-13b-chat-v1-ModelID-bedrock/meta.llama2-13b-chat-v1: 1
bedrock/ai21.j2-ultra-v1-ModelID-bedrock/ai21.j2-ultra-v1: 1
bedrock/amazon.titan-text-express-v1-ModelID-bedrock/amazon.titan-text-express-v1: 1
azure/gpt-4-1106-preview-ModelID-azure/gpt-4-1106-previewos.environ/AZURE_API_KEY_UKSOUTH2023-10-01-previewhttps://uksouth-aimoda.openai.azure.com/: 1

Success Calls made
bedrock/meta.llama2-13b-chat-v1-ModelID-bedrock/meta.llama2-13b-chat-v1: 1
azure/gpt-4-1106-preview-ModelID-azure/gpt-4-1106-previewos.environ/AZURE_API_KEY_UKSOUTH2023-10-01-previewhttps://uksouth-aimoda.openai.azure.com/: 1

Fail Calls made
bedrock/cohere.command-text-v14-ModelID-bedrock/cohere.command-text-v14: 2
bedrock/ai21.j2-ultra-v1-ModelID-bedrock/ai21.j2-ultra-v1: 1
bedrock/amazon.titan-text-express-v1-ModelID-bedrock/amazon.titan-text-express-v1: 1
Traceback (most recent call last):
  File "/Users/dave/Work/litellm/litellm/llms/bedrock.py", line 410, in completion
    response = client.invoke_model_with_response_stream(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/lib/python3.11/site-packages/botocore/client.py", line 535, in _api_call
    return self._make_api_call(operation_name, kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/lib/python3.11/site-packages/botocore/client.py", line 983, in _make_api_call
    raise error_class(parsed_response, operation_name)
botocore.errorfactory.ValidationException: An error occurred (ValidationException) when calling the InvokeModelWithResponseStream operation: Malformed input request: 2 schema violations found, please reformat your input and try again.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/dave/Work/litellm/litellm/main.py", line 1173, in completion
    model_response = bedrock.completion(
                     ^^^^^^^^^^^^^^^^^^^
  File "/Users/dave/Work/litellm/litellm/llms/bedrock.py", line 505, in completion
    raise BedrockError(status_code=500, message=traceback.format_exc())
litellm.llms.bedrock.BedrockError: Traceback (most recent call last):
  File "/Users/dave/Work/litellm/litellm/llms/bedrock.py", line 410, in completion
    response = client.invoke_model_with_response_stream(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/lib/python3.11/site-packages/botocore/client.py", line 535, in _api_call
    return self._make_api_call(operation_name, kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/lib/python3.11/site-packages/botocore/client.py", line 983, in _make_api_call
    raise error_class(parsed_response, operation_name)
botocore.errorfactory.ValidationException: An error occurred (ValidationException) when calling the InvokeModelWithResponseStream operation: Malformed input request: 2 schema violations found, please reformat your input and try again.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/dave/Work/litellm/litellm/main.py", line 188, in acompletion
    response =  await loop.run_in_executor(None, func_with_context)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/Cellar/python@3.11/3.11.6_1/Frameworks/Python.framework/Versions/3.11/lib/python3.11/concurrent/futures/thread.py", line 58, in run
    result = self.fn(*self.args, **self.kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/dave/Work/litellm/litellm/utils.py", line 1362, in wrapper
    raise e
  File "/Users/dave/Work/litellm/litellm/utils.py", line 1293, in wrapper
    result = original_function(*args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/dave/Work/litellm/litellm/main.py", line 1398, in completion
    raise exception_type(
          ^^^^^^^^^^^^^^^
  File "/Users/dave/Work/litellm/litellm/utils.py", line 4632, in exception_type
    raise e
  File "/Users/dave/Work/litellm/litellm/utils.py", line 3959, in exception_type
    raise BadRequestError(
litellm.exceptions.BadRequestError: BedrockException - Traceback (most recent call last):
  File "/Users/dave/Work/litellm/litellm/llms/bedrock.py", line 410, in completion
    response = client.invoke_model_with_response_stream(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/lib/python3.11/site-packages/botocore/client.py", line 535, in _api_call
    return self._make_api_call(operation_name, kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/lib/python3.11/site-packages/botocore/client.py", line 983, in _make_api_call
    raise error_class(parsed_response, operation_name)
botocore.errorfactory.ValidationException: An error occurred (ValidationException) when calling the InvokeModelWithResponseStream operation: Malformed input request: 2 schema violations found, please reformat your input and try again.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/dave/Work/litellm/litellm/router.py", line 403, in async_function_with_retries
    response = await original_function(*args, **kwargs)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/dave/Work/litellm/litellm/router.py", line 244, in _acompletion
    raise e
  File "/Users/dave/Work/litellm/litellm/router.py", line 238, in _acompletion
    response = await litellm.acompletion(**{**data, "messages": messages, "caching": self.cache_responses, "client": model_client, **kwargs})
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/dave/Work/litellm/litellm/utils.py", line 1458, in wrapper_async
    raise e
  File "/Users/dave/Work/litellm/litellm/utils.py", line 1404, in wrapper_async
    result = await original_function(*args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/dave/Work/litellm/litellm/main.py", line 195, in acompletion
    raise exception_type(
          ^^^^^^^^^^^^^^^
  File "/Users/dave/Work/litellm/litellm/utils.py", line 4632, in exception_type
    raise e
  File "/Users/dave/Work/litellm/litellm/utils.py", line 3959, in exception_type
    raise BadRequestError(
litellm.exceptions.BadRequestError: BedrockException - BedrockException - Traceback (most recent call last):
  File "/Users/dave/Work/litellm/litellm/llms/bedrock.py", line 410, in completion
    response = client.invoke_model_with_response_stream(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/lib/python3.11/site-packages/botocore/client.py", line 535, in _api_call
    return self._make_api_call(operation_name, kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/lib/python3.11/site-packages/botocore/client.py", line 983, in _make_api_call
    raise error_class(parsed_response, operation_name)
botocore.errorfactory.ValidationException: An error occurred (ValidationException) when calling the InvokeModelWithResponseStream operation: Malformed input request: 2 schema violations found, please reformat your input and try again.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/dave/Work/litellm/litellm/proxy/proxy_server.py", line 733, in chat_completion
    response = await llm_router.acompletion(**data)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/dave/Work/litellm/litellm/router.py", line 211, in acompletion
    raise e
  File "/Users/dave/Work/litellm/litellm/router.py", line 207, in acompletion
    response = await asyncio.wait_for(self.async_function_with_fallbacks(**kwargs), timeout=timeout)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/Cellar/python@3.11/3.11.6_1/Frameworks/Python.framework/Versions/3.11/lib/python3.11/asyncio/tasks.py", line 489, in wait_for
    return fut.result()
           ^^^^^^^^^^^^
  File "/Users/dave/Work/litellm/litellm/router.py", line 391, in async_function_with_fallbacks
    raise original_exception
  File "/Users/dave/Work/litellm/litellm/router.py", line 343, in async_function_with_fallbacks
    response = await self.async_function_with_retries(*args, **kwargs)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/dave/Work/litellm/litellm/router.py", line 441, in async_function_with_retries
    raise e
  File "/Users/dave/Work/litellm/litellm/router.py", line 426, in async_function_with_retries
    response = await original_function(*args, **kwargs)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/dave/Work/litellm/litellm/router.py", line 244, in _acompletion
    raise e
  File "/Users/dave/Work/litellm/litellm/router.py", line 221, in _acompletion
    deployment = self.get_available_deployment(model=model, messages=messages)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/dave/Work/litellm/litellm/router.py", line 916, in get_available_deployment
    raise ValueError("No models available")
ValueError: No models available
INFO:     143.198.0.207:0 - "POST /v1/chat/completions HTTP/1.1" 500 Internal Server Error
receiving data: {'model': 'command', 'temperature': 1, 'top_p': 1, 'presence_penalty': 0, 'frequency_penalty': 0, 'stop': ['||>', '\nUser:', '<|diff_marker|>'], 'user': 'removed', 'stream': True, 'messages': [{'role': 'user', 'content': 'morning'}]}
LiteLLM.Router: Inside async function with retries: args - (); kwargs - {'temperature': 1, 'top_p': 1, 'presence_penalty': 0, 'frequency_penalty': 0, 'stop': ['||>', '\nUser:', '<|diff_marker|>'], 'user': 'removed', 'stream': True, 'metadata': {'user_api_key': 'removed', 'model_group': 'command'}, 'request_timeout': 600, 'model': 'command', 'messages': [{'role': 'user', 'content': 'morning'}], 'original_function': <bound method Router._acompletion of <litellm.router.Router object at 0x113c25910>>, 'num_retries': 3}
LiteLLM.Router: async function w/ retries: original_function - <bound method Router._acompletion of <litellm.router.Router object at 0x113c25910>>
LiteLLM.Router: Inside _acompletion()- model: command; kwargs: {'temperature': 1, 'top_p': 1, 'presence_penalty': 0, 'frequency_penalty': 0, 'stop': ['||>', '\nUser:', '<|diff_marker|>'], 'user': 'removed', 'stream': True, 'metadata': {'user_api_key': 'removed', 'model_group': 'command'}, 'request_timeout': 600}
LiteLLM.Router: initial list of deployments: [{'model_name': 'command', 'litellm_params': {'model': 'bedrock/cohere.command-text-v14-ModelID-bedrock/cohere.command-text-v14'}}, {'model_name': 'command', 'litellm_params': {'model': 'bedrock/cohere.command-text-v14-ModelID-bedrock/cohere.command-text-v14'}}]
get cache: cache key: 10-20:cooldown_models
get cache: cache result: ['bedrock/cohere.command-text-v14-ModelID-bedrock/cohere.command-text-v14']
LiteLLM.Router: retrieve cooldown models: ['bedrock/cohere.command-text-v14-ModelID-bedrock/cohere.command-text-v14']
LiteLLM.Router: cooldown deployments: ['bedrock/cohere.command-text-v14-ModelID-bedrock/cohere.command-text-v14']
LiteLLM.Router: healthy deployments: length 0 []
LiteLLM.Router: An exception occurs
LiteLLM.Router: Trying to fallback b/w models
An error occurred: No models available

Manouchehri commented 11 months ago

This has been fixed, thanks! =D

BerriAI / litellm