anon998 / simple-proxy-for-tavern

GNU Affero General Public License v3.0
110 stars 6 forks source link

Fake OpenAI API for Kobold

A workaround to have more control over the prompt format when using SillyTavern and local models.

This script sits between SillyTavern and a backend like Kobold and it lets you change how the final prompt text will look. By default, it includes a prompt format that works well with LLaMA models tuned to follow instructions. It does this by presenting itself to SillyTavern as an OpenAI API, processing the conversation, and sending the prompt text to the backend.

The LLaMA tokenizer needs a modern Node.js version to work. Use the latest LTS version of Node.js.

You need a local backend like KoboldAI, koboldcpp, llama.cpp or Ooba in API mode to load the model, but it also works with the Horde, where people volunteer to share their GPUs online.

Table of Contents

Installation

You'll need SillyTavern, the proxy, and a backend running. This guide is for the proxy.

Copy the file config.default.mjs to config.mjs if you want to make changes to the config. That way they aren't lost during updates. If you're going to use the Horde, set your key and the models you want to use there.

There are now generation and prompt formats presets in the presets/ and prompt-formats/ folders.

Tavern Settings

Download alpaca.settings and put it in SillyTavern/public/OpenAI Settings/ and reload or start Tavern. Some of the values in the next steps will already be complete.

After pressing the second button of the top panel, select "OpenAI" as the API and write a random API key; it doesn't matter. api connections

Press the first button and select the "alpaca" preset. If it was already selected, you might need to change to Default and then back to alpaca for the settings to load correctly.

If it doesn't exist, create one. In older versions, the button might be at the bottom of that panel or to the right of the select box.

settings screenshot

Press the second button from the top panel again and select "Connect".

Notes

Leave Context Size high so Tavern doesn't truncate the messages, we're doing that in this script.

Tavern settings like Temperature, Max Response Length, etc. are ignored. Edit generationPreset in conf.mjs to select a preset. The presets are located in the presets/ directory. There's also a replyAttributes variable that, by default, alters the prompt to induce the AI into giving more descriptive responses.

If you want to always keep the example messages of the character in the prompt, you have to edit keepExampleMessagesInPrompt in conf.mjs while also enabling the option in the Tavern UI.

The last prompt is saved as prompt.txt. You can use it to check that everything is okay with the way the prompt is generated.

Ooba needs to be started with --extensions api and the streaming API was added Apr 23, 2023.

Files

Examples

Rentry with examples from /lmg/ rp example

Changelog

changelog