LostRuins / koboldcpp

A simple one-file way to run various GGML and GGUF models with KoboldAI's UI
https://github.com/lostruins/koboldcpp
GNU Affero General Public License v3.0
4.35k stars 312 forks source link

Multi-context #881

Closed PioneerMNDR closed 1 month ago

PioneerMNDR commented 1 month ago

Hi, I have a question. I have 2 characters who communicate with each other, but each has their own character description in context For example: John's Context:

  1. John's Description
  2. Dialogue

Emily's Context:

  1. Description of Emily
  2. Dialogue

The problem is that when changing a character, a complete BLAS process takes place and it takes a very long time. I use the kobold api. It would be logical if the Context was maintained depending on the genkey multiuser. Either I don't understand how it works at a basic level

LostRuins commented 1 month ago

Unfortunately this is not possible at the moment - in order to enjoy context-shifting for 2 different contexts, you'll need to run 2 separate instances of kobold on different ports and swap between the endpoints as you use each one. With mmap enabled you might be able to avoid duplication of memory, assuming you are not offloading to the GPU.