BerriAI / litellm

Python SDK, Proxy Server (LLM Gateway) to call 100+ LLM APIs in OpenAI format - [Bedrock, Azure, OpenAI, VertexAI, Cohere, Anthropic, Sagemaker, HuggingFace, Replicate, Groq]
https://docs.litellm.ai/docs/
Other
12.63k stars 1.47k forks source link

[Bug]: Garbled/non-sense response when using LiteLLM to proxy Ollama in LibreChat #2699

Closed K-J-VV closed 6 months ago

K-J-VV commented 6 months ago

What happened?

I am trying to use LiteLLM to proxy Ollama to LibreChat. However, when I ask anything to LiteLLM it responds but the text comes out... garbled non-sense. When I ask Ollama directly the response is great. Below are screenshots.

Response via LiteLLM, proxying Ollama

image

Response via Ollama, directly

image

Here is the content of my /app/config.yaml for LiteLLM:

model_list:
  - model_name: gpt-3.5-turbo # user-facing model alias
    litellm_params:
      model: ollama/dolphin-mistral
      api_base: https://ollama.example.com
      stream: True

litellm_settings:
  drop_params: True

general_settings: 
  master_key: sk-xxxxx # [OPTIONAL] if set all calls to proxy will require either this key or a valid generated token

Here is the content of my /app/librechat.yaml file

version: 1.0.1
cache: true
endpoints:
  custom:
    - name: "LiteLLM"
      apiKey: "sk-xxxxx"
      baseURL: "https://litellm.example.com/v1/"
      models:
        default: ["gpt-3.5-turbo"]
        fetch: true
      titleConvo: true
      titleModel: "gpt-3.5-turbo"
      summarize: false
      summaryModel: "gpt-3.5-turbo"
      forcePrompt: false
      modelDisplayLabel: "LiteLLM"

    - name: "Ollama"
      apiKey: "ollama"
      baseURL: "https://ollama.example.com/v1/"
      models:
        default: ["dolphin-mistral"]
        fetch: false # fetching list of models is not supported
      titleConvo: true
      titleModel: "dolphin-mistral"
      summarize: false
      summaryModel: "dolphin-mistral"
      forcePrompt: false
      modelDisplayLabel: "dolphin"

Relevant log output

////////////////////////////////
Log from LiteLLM

FYI, the 10.10.10.10 address is the local IP of my reverse proxy (NGINX Proxy Manager)
/////////////////////////////////

#------------------------------------------------------------#
#                                                            #
#       'This feature doesn't meet my needs because...'       #
#        https://github.com/BerriAI/litellm/issues/new        #
#                                                            #
#------------------------------------------------------------#
 Thank you for using LiteLLM! - Krrish & Ishaan
Give Feedback / Get Help: https://github.com/BerriAI/litellm/issues/new
LiteLLM: Proxy initialized with Config, Set models:
    gpt-3.5-turbo
LiteLLM_VerificationTokenView Created!
MonthlyGlobalSpend Created!
Last30dKeysBySpend Created!
Last30dModelsBySpend Created!
MonthlyGlobalSpendPerKey Created!
Last30dTopEndUsersSpend Created!
INFO:     10.10.10.10:52332 - "GET /v1/models HTTP/1.1" 401 Unauthorized
INFO:     10.10.10.10:52342 - "GET /v1//models HTTP/1.1" 404 Not Found
INFO:     10.10.10.10:51146 - "GET /v1//models HTTP/1.1" 404 Not Found
INFO:     10.10.10.10:51150 - "POST /v1/chat/completions HTTP/1.1" 200 OK
Traceback (most recent call last):
  File "/usr/local/lib/python3.11/site-packages/litellm/proxy/proxy_server.py", line 3219, in chat_completion
    response = await proxy_logging_obj.post_call_success_hook(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/litellm/proxy/utils.py", line 449, in post_call_success_hook
    new_response = copy.deepcopy(response)
                   ^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/copy.py", line 161, in deepcopy
    rv = reductor(4)
         ^^^^^^^^^^^
TypeError: cannot pickle 'async_generator' object
INFO:     10.10.10.10:52634 - "POST /v1/chat/completions HTTP/1.1" 500 Internal Server Error
Traceback (most recent call last):
  File "/usr/local/lib/python3.11/site-packages/litellm/proxy/proxy_server.py", line 3219, in chat_completion
    response = await proxy_logging_obj.post_call_success_hook(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/litellm/proxy/utils.py", line 449, in post_call_success_hook
    new_response = copy.deepcopy(response)
                   ^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/copy.py", line 161, in deepcopy
    rv = reductor(4)
         ^^^^^^^^^^^
TypeError: cannot pickle 'async_generator' object
INFO:     10.10.10.10:52638 - "POST /v1/chat/completions HTTP/1.1" 500 Internal Server Error
Traceback (most recent call last):
  File "/usr/local/lib/python3.11/site-packages/litellm/proxy/proxy_server.py", line 3219, in chat_completion
    response = await proxy_logging_obj.post_call_success_hook(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/litellm/proxy/utils.py", line 449, in post_call_success_hook
    new_response = copy.deepcopy(response)
                   ^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/copy.py", line 161, in deepcopy
    rv = reductor(4)
         ^^^^^^^^^^^
TypeError: cannot pickle 'async_generator' object
INFO:     10.10.10.10:52648 - "POST /v1/chat/completions HTTP/1.1" 500 Internal Server Error
INFO:     10.10.10.10:42388 - "GET /v1//models HTTP/1.1" 404 Not Found
INFO:     10.10.10.10:42400 - "POST /v1/chat/completions HTTP/1.1" 200 OK
Traceback (most recent call last):
  File "/usr/local/lib/python3.11/site-packages/litellm/proxy/proxy_server.py", line 3219, in chat_completion
    response = await proxy_logging_obj.post_call_success_hook(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/litellm/proxy/utils.py", line 449, in post_call_success_hook
    new_response = copy.deepcopy(response)
                   ^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/copy.py", line 161, in deepcopy
    rv = reductor(4)
         ^^^^^^^^^^^
TypeError: cannot pickle 'async_generator' object
INFO:     10.10.10.10:42402 - "POST /v1/chat/completions HTTP/1.1" 500 Internal Server Error
Traceback (most recent call last):
  File "/usr/local/lib/python3.11/site-packages/litellm/proxy/proxy_server.py", line 3219, in chat_completion
    response = await proxy_logging_obj.post_call_success_hook(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/litellm/proxy/utils.py", line 449, in post_call_success_hook
    new_response = copy.deepcopy(response)
                   ^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/copy.py", line 161, in deepcopy
    rv = reductor(4)
         ^^^^^^^^^^^
TypeError: cannot pickle 'async_generator' object
INFO:     10.10.10.10:42416 - "POST /v1/chat/completions HTTP/1.1" 500 Internal Server Error
Traceback (most recent call last):
  File "/usr/local/lib/python3.11/site-packages/litellm/proxy/proxy_server.py", line 3219, in chat_completion
    response = await proxy_logging_obj.post_call_success_hook(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/litellm/proxy/utils.py", line 449, in post_call_success_hook
    new_response = copy.deepcopy(response)
                   ^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/copy.py", line 161, in deepcopy
    rv = reductor(4)
         ^^^^^^^^^^^
TypeError: cannot pickle 'async_generator' object
INFO:     10.10.10.10:42420 - "POST /v1/chat/completions HTTP/1.1" 500 Internal Server Error

/////////////////////////////////////////
Log from LibreChat
/////////////////////////////////////////

> LibreChat@0.6.10 backend
> cross-env NODE_ENV=production node api/server/index.js
2024-03-26 15:09:35 info: [Optional] Redis not initialized. Note: Redis support is experimental.
2024-03-26 15:09:36 info: Connected to MongoDB
2024-03-26 15:09:36 info: Custom config file loaded:
2024-03-26 15:09:36 info: {
  "version": "1.0.1",
  "cache": true,
  "endpoints": {
    "custom": [
      {
        "name": "LiteLLM",
        "apiKey": "sk-xxxxx",
        "baseURL": "https://litellm.example.com/v1/",
        "models": {
          "default": [
            "gpt-3.5-turbo"
          ],
          "fetch": true
        },
        "titleConvo": true,
        "titleModel": "gpt-3.5-turbo",
        "summarize": false,
        "summaryModel": "gpt-3.5-turbo",
        "forcePrompt": false,
        "modelDisplayLabel": "LiteLLM"
      },
      {
        "name": "Ollama",
        "apiKey": "ollama",
        "baseURL": "https://ollama.example.com/v1/",
        "models": {
          "default": [
            "dolphin-mistral"
          ],
          "fetch": false
        },
        "titleConvo": true,
        "titleModel": "dolphin-mistral",
        "summarize": false,
        "summaryModel": "dolphin-mistral",
        "forcePrompt": false,
        "modelDisplayLabel": "dolphin"
      }
    ]
  }
}
2024-03-26 15:09:36 info: 
Outdated Config version: 1.0.1. Current version: 1.0.3
Check out the latest config file guide for new options and features.
https://docs.librechat.ai/install/configuration/custom_config.html
Warning: connect.session() MemoryStore is not
designed for a production environment, as it will leak
memory, and will not scale past a single process.
2024-03-26 15:09:36 info: Server listening on all interfaces at port 3080. Use http://localhost:3080 to access it
2024-03-26 15:09:48 error: Failed to fetch models from OpenAI API Something happened in setting up the request Cannot read properties of undefined (reading 'status')
2024-03-26 15:09:48 error: Failed to fetch models from LiteLLM API The request was made and the server responded with a status code that falls out of the range of 2xx: Request failed with status code 404
2024-03-26 15:12:42 error: Failed to fetch models from Ollama API The request was made and the server responded with a status code that falls out of the range of 2xx: Request failed with status code 404
2024-03-26 15:12:43 error: [MeiliMongooseModel.findOneAndUpdate] Convo not found in MeiliSearch and will index a315aa51-6dc1-4f2b-97ab-f9c52a16ff6d Document `a315aa51-6dc1-4f2b-97ab-f9c52a16ff6d` not found.
2024-03-26 15:12:45 warn: [OpenAIClient.chatCompletion][finalChatCompletion] Aborted Message
2024-03-26 15:12:45 warn: [OpenAIClient.chatCompletion][finalChatCompletion] API error
2024-03-26 15:12:53 error: Failed to fetch models from LiteLLM API The request was made and the server responded with a status code that falls out of the range of 2xx: Request failed with status code 404
2024-03-26 15:12:53 error: [MeiliMongooseModel.findOneAndUpdate] Convo not found in MeiliSearch and will index 91a1230f-e151-4598-9024-593002e9f8e1 Document `91a1230f-e151-4598-9024-593002e9f8e1` not found.
2024-03-26 15:12:56 warn: [OpenAIClient.chatCompletion][create] API error
2024-03-26 15:12:56 error: [OpenAIClient.chatCompletion] Unhandled error type Error: 500 cannot pickle 'async_generator' object
2024-03-26 15:12:56 error: [OpenAIClient] There was an issue generating the title with the completion method Error: 500 cannot pickle 'async_generator' object
2024-03-26 15:13:44 error: Failed to fetch models from LiteLLM API The request was made and the server responded with a status code that falls out of the range of 2xx: Request failed with status code 404
2024-03-26 15:13:44 error: [MeiliMongooseModel.findOneAndUpdate] Convo not found in MeiliSearch and will index a2fae8a5-1f93-4a95-86aa-5d708042261e Document `a2fae8a5-1f93-4a95-86aa-5d708042261e` not found.
2024-03-26 15:13:47 warn: [OpenAIClient.chatCompletion][create] API error
2024-03-26 15:13:47 error: [OpenAIClient.chatCompletion] Unhandled error type Error: 500 cannot pickle 'async_generator' object
2024-03-26 15:13:47 error: [OpenAIClient] There was an issue generating the title with the completion method Error: 500 cannot pickle 'async_generator' object
2024-03-26 15:15:01 error: Failed to fetch models from Ollama API The request was made and the server responded with a status code that falls out of the range of 2xx: Request failed with status code 404
2024-03-26 15:15:01 error: [MeiliMongooseModel.findOneAndUpdate] Convo not found in MeiliSearch and will index 889c72d6-12f2-4bb3-a12a-6eabed542253 Document `889c72d6-12f2-4bb3-a12a-6eabed542253` not found.
2024-03-26 15:15:13 warn: [OpenAIClient.chatCompletion][finalChatCompletion] Aborted Message
2024-03-26 15:15:13 warn: [OpenAIClient.chatCompletion][finalChatCompletion] API error

Twitter / LinkedIn details

No response

K-J-VV commented 6 months ago

Looks like documentation on LiteLLM may need updating? Was able to get help over at LibreChat, answer can be found here https://github.com/danny-avila/LibreChat/discussions/2215

shuther commented 5 months ago

@K-J-VV could you detail the change you applied as I am facing the same issue?

shuther commented 5 months ago

I found this change but I face the problems with every models (llama3, mistral)

K-J-VV commented 5 months ago

@K-J-VV could you detail the change you applied as I am facing the same issue?

Just realized my comment above had incorrect hyperlink to solution, updated hope it helps!