Prerequisites

Please answer the following questions for yourself before submitting an issue.

[TRUE ] I am running the latest code. Development is very rapid so there are no tagged versions as of now.
[ TRUE] I carefully followed the README.md.
[ TRUE] I searched using keywords relevant to my issue to make sure that I am creating a new issue that is not already open (or closed).
[ TRUE] I reviewed the Discussions, and have a new bug or useful enhancement to share.

llama-cpp-python has a OpenAI compatible server

I am serving a model as:

GGML_SYCL_DEVICE=0 python3 -m llama_cpp.server --model mistral-7b-instruct-v0.2.Q8_0.gguf --chat_format mistral-instruct --host 0.0.0.0 --port 7836 --n_ctx 16192 --n_gpu_layers 35

Output from llama-cpp-python:

INFO:     Started server process [12022]
INFO:     Waiting for application startup.
INFO:     Application startup complete.
INFO:     Uvicorn running on http://0.0.0.0:7836 (Press CTRL+C to quit)
INFO:     192.168.3.114:51186 - "GET /v1/models HTTP/1.1" 200 OK # Librechat is requesting
INFO:     192.168.3.114:51198 - "POST /v1 HTTP/1.1" 404 Not Found # Librechat is requesting

I have checked http://192.168.3.113:7836/docs works

This is my librechat output when I try to chat with my model

Something went wrong. Here's the specific error message we encountered: Failed to send message. HTTP 404 - {"detail":"Not Found"}

This is my librechat.yaml configuration

# Configuration version (required)
version: 1.0.1

# Cache settings: Set to true to enable caching
cache: true

# Definition of custom endpoints
endpoints:
  custom:

    - name: "llama"
      apiKey: "sk-1234"
      baseURL: "http://192.168.3.113:7836/v1" # I have tried with just "http://192.168.3.113:7836" also, librechat again hits invalid endpoint
      iconURL: "<some url>"
      models:
        default: ["mistral-7b-v0.1", "mistral-7b-instruct-v0.2"] # I was loading multiple models first, but I was trying to diagnose the error with 1 model only
        fetch: true
      titleConvo: true
      titleModel: "mistral-7b-v0.1"
      titleMethod: "completion"
      summarize: true
      summaryModel: "mistral-7b-v0.1"
      forcePrompt: true
      modelDisplayLabel : "llama"

Instead of the yaml file, I have also tried to modify the OPENAI_REVERSE_PROXY in .env file. (the old way of librechat endpoints)

Possibilities:

llama-cpp-python is not serving a OpenAI compatible server or the api endpoints are incomplete or not redirecting properly
I am missing some configuration in Librechat, since chat format is --chat_format mistral-instruct
I am missing some configuration for llama-cpp-python with chat format is --chat_format mistral-instruct

abetlen / llama-cpp-python

Compatibility with LibreChat: According to Librechat, the server is not serving a OpenAI complaint API for mistral models with llama.cpp chat format `mistral-instruct` #1174

Prerequisites