LLProxy - Large Language Proxy

LLProxy

GitHub Go version

Summary

LLProxy was designed for the task of effectively managing rate limits and scheduling of workload across multiple different LLM based applications. The rate limits for these services are complex, beyond what can easily be configured with the simplest of reverse proxies. LLProxy addresses this by creating a scheduler that deeply understandings the core LLM providers rate limiting behavior.

Features

The following providers are currently supported: [openai]
The following scheduling is currently supported: [FIFO]

Usage

Setup your configuration file:
```
cp config-example.json config.json
```
Each provider can be defined as a specific route.

config.json
```
{
    "routes": {
        "openai": {
            "forward": "https://api.openai.com",
            "provider": "openai",
            "models": {
                "gpt-4": {
                    "maxQueueSize": 10,
                    "maxQueueWait": 30,
                    "rpm": 200,
                    "tpm": 40000
                },
                ...
            }
        }
        ...
    }
}
```
The above creates a route http://proxyhost:8080/openai/... that routes all traffic sent to that route to https://api.openai.com/...

It further defines a scheduler for the gpt-4 model that sets:
- maxQueueSize defines how many requests are allowed to sit in the queue prior to being scheduled
- maxQueueWait defines how long, in seconds, it will allow a request to wait before it starts rejecting additional requests with RateLimit errors.
- rpm the maximum requests per minute
- tpm the maximum tokens per minute
Requests and tokens per minute are consumed as requests come in and recover over time. If a request cannot be immediately processed then it will sit in the queue for up to maxQueueWait seconds, and up to maxQueueSize items can be outstanding in the queue.

Set a config for every model you want to support.
[Optional] Run tests
```
./test.sh
```

[Optional] Look at code coverage

go tool cover -html=coverage.out -o coverage.html

Build the application
```
./build.sh
```
Run the application
```
./llproxy
```

Direct traffic to your proxy server

import openai
openai.api_base = 'http://<your-proxy-address>:8080/openai/v1'
...

definitive-io / LLProxy

readme

LLProxy - Large Language Proxy

Summary

Features

Usage