Popular AI company compatible LLM Server

This is a very simple Flask application that provides a popular compatible API for other large language models.

Very useful if you have tests or lots of running Collaborative Agent Modules :-)

It currently supports Llama2, Mistral-7b and RWKV since these models can run pretty easily on local hardware which makes it a great fit for the agent use case.

Streaming is supported as well.

Setup

Create a venv python3 -m venv venv
Activate venv source venv/bin/activate (or venv\Scripts\activate on Windows)
Install dependencies pip install -r requirements.txt
Create a symlink to your models. Example ln -s /mnt/ssd/models/rwkv models/rwkv
Run the server using python app.py.

Sending Requests

curl http://localhost:5000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer WE_DONT_NEED_NO_STINKING_TOKENS" \
  -d '{
    "model": "mistral-7b-instruct",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'

XpressAI / xai-llm-server

readme

Popular AI company compatible LLM Server

Setup

Sending Requests