abetlen / llama-cpp-python

Python bindings for llama.cpp
https://llama-cpp-python.readthedocs.io
MIT License
7.15k stars 848 forks source link

Compatibility with LibreChat: According to Librechat, the server is not serving a OpenAI complaint API for mistral models with llama.cpp chat format `mistral-instruct` #1174

Open 0x33taji opened 4 months ago

0x33taji commented 4 months ago

Prerequisites

Please answer the following questions for yourself before submitting an issue.

llama-cpp-python has a OpenAI compatible server

I am serving a model as:

GGML_SYCL_DEVICE=0 python3 -m llama_cpp.server --model mistral-7b-instruct-v0.2.Q8_0.gguf --chat_format mistral-instruct --host 0.0.0.0 --port 7836 --n_ctx 16192 --n_gpu_layers 35 

Output from llama-cpp-python:

INFO:     Started server process [12022]
INFO:     Waiting for application startup.
INFO:     Application startup complete.
INFO:     Uvicorn running on http://0.0.0.0:7836 (Press CTRL+C to quit)
INFO:     192.168.3.114:51186 - "GET /v1/models HTTP/1.1" 200 OK # Librechat is requesting
INFO:     192.168.3.114:51198 - "POST /v1 HTTP/1.1" 404 Not Found # Librechat is requesting

I have checked http://192.168.3.113:7836/docs works

This is my librechat output when I try to chat with my model

Something went wrong. Here's the specific error message we encountered: Failed to send message. HTTP 404 - {"detail":"Not Found"}

This is my librechat.yaml configuration

# Configuration version (required)
version: 1.0.1

# Cache settings: Set to true to enable caching
cache: true

# Definition of custom endpoints
endpoints:
  custom:

    - name: "llama"
      apiKey: "sk-1234"
      baseURL: "http://192.168.3.113:7836/v1" # I have tried with just "http://192.168.3.113:7836" also, librechat again hits invalid endpoint
      iconURL: "<some url>"
      models:
        default: ["mistral-7b-v0.1", "mistral-7b-instruct-v0.2"] # I was loading multiple models first, but I was trying to diagnose the error with 1 model only
        fetch: true
      titleConvo: true
      titleModel: "mistral-7b-v0.1"
      titleMethod: "completion"
      summarize: true
      summaryModel: "mistral-7b-v0.1"
      forcePrompt: true
      modelDisplayLabel : "llama"

Instead of the yaml file, I have also tried to modify the OPENAI_REVERSE_PROXY in .env file. (the old way of librechat endpoints)

Possibilities:

  1. llama-cpp-python is not serving a OpenAI compatible server or the api endpoints are incomplete or not redirecting properly
  2. I am missing some configuration in Librechat, since chat format is --chat_format mistral-instruct
  3. I am missing some configuration for llama-cpp-python with chat format is --chat_format mistral-instruct
0x33taji commented 4 months ago

https://github.com/danny-avila/LibreChat/issues/1769