Fake OpenAI API for Kobold

A workaround to have more control over the prompt format when using SillyTavern and local models.

This script sits between SillyTavern and a backend like Kobold and it lets you change how the final prompt text will look. By default, it includes a prompt format that works well with LLaMA models tuned to follow instructions. It does this by presenting itself to SillyTavern as an OpenAI API, processing the conversation, and sending the prompt text to the backend.

The LLaMA tokenizer needs a modern Node.js version to work. Use the latest LTS version of Node.js.

You need a local backend like KoboldAI, koboldcpp, llama.cpp or Ooba in API mode to load the model, but it also works with the Horde, where people volunteer to share their GPUs online.

Installation
File Structure
Examples
Changelog

Installation

You'll need SillyTavern, the proxy, and a backend running. This guide is for the proxy.

Install the LTS version of Node.js.
Clone this repository with Git, or click on Code -> Download ZIP and extract it anywhere on your computer.
On Windows, open the folder you extracted or cloned and double-click start.bat
On Linux, navigate to the directory with a terminal and run ./start.sh

Copy the file config.default.mjs to config.mjs if you want to make changes to the config. That way they aren't lost during updates. If you're going to use the Horde, set your key and the models you want to use there.

There are now generation and prompt formats presets in the presets/ and prompt-formats/ folders.

Tavern Settings

Download alpaca.settings and put it in SillyTavern/public/OpenAI Settings/ and reload or start Tavern. Some of the values in the next steps will already be complete.

After pressing the second button of the top panel, select "OpenAI" as the API and write a random API key; it doesn't matter. api connections

Press the first button and select the "alpaca" preset. If it was already selected, you might need to change to Default and then back to alpaca for the settings to load correctly.

If it doesn't exist, create one. In older versions, the button might be at the bottom of that panel or to the right of the select box.

Scroll up and set "OpenAI Reverse Proxy" to http://127.0.0.1:29172/v1
Delete the default Main Prompt and NSFW Prompt.
Change Jailbreak Prompt to "{{char}}|{{user}}". If you want to add your own text there, do it on the second line.
Change Impersonation Prompt to "IMPERSONATION_PROMPT".
On the checkboxes above, enable NSFW Toggle and Send Jailbreak.
Enable Streaming too if you want that.

settings screenshot

Press the second button from the top panel again and select "Connect".

Notes

Leave Context Size high so Tavern doesn't truncate the messages, we're doing that in this script.

Tavern settings like Temperature, Max Response Length, etc. are ignored. Edit generationPreset in conf.mjs to select a preset. The presets are located in the presets/ directory. There's also a replyAttributes variable that, by default, alters the prompt to induce the AI into giving more descriptive responses.

If you want to always keep the example messages of the character in the prompt, you have to edit keepExampleMessagesInPrompt in conf.mjs while also enabling the option in the Tavern UI.

The last prompt is saved as prompt.txt. You can use it to check that everything is okay with the way the prompt is generated.

Ooba needs to be started with --extensions api and the streaming API was added Apr 23, 2023.