LostRuins / koboldcpp

A simple one-file way to run various GGML and GGUF models with a KoboldAI UI
https://github.com/lostruins/koboldcpp
GNU Affero General Public License v3.0
4.41k stars 319 forks source link

Output stops abruptly after ~20 messages #972

Open DarkSRKI24 opened 1 week ago

DarkSRKI24 commented 1 week ago
Processing Prompt (1 / 1 tokens)
Generating (1 / 242 tokens)
(EOS token triggered! ID:2)
CtxLimit: 3768/4096, Process:0.44s (442.0ms/T = 2.26T/s), Generate:0.00s (3.0ms/T = 333.33T/s), Total:0.45s (2.25T/s)
Output:

Doesn't matter what the model, it keeps happening. After 20 or so messages the model starts outputting nothing or "::::::::::::::::::". I updated both KoboldCpp and SillyTavern to the latest version but it keeps happening. I am not that tech-savvy so any guess is as good as mine

DarkSRKI24 commented 1 week ago

here is a link to the settings for the launcher. the name of the file is the model I'm using

I don't understand what is happening does the context limit fill up and that's it you can't interact anymore? is there a way to empty out the oldest part of the context and gradually keep replacing it with the new one?

LostRuins commented 1 week ago

Try using --contextsize 8192