LostRuins / koboldcpp

A simple one-file way to run various GGML and GGUF models with KoboldAI's UI
https://github.com/lostruins/koboldcpp
GNU Affero General Public License v3.0
4.36k stars 312 forks source link

Reprocessing Issue with Llama 3 #803

Closed Nabokov86 closed 2 months ago

Nabokov86 commented 2 months ago

When using Llama 3, I've noticed that unnecessary reprocessing occurs on previously generated text. To reproduce this issue, try generating a short piece of text couple of times and see how the processing sometimes happens.

Latest concedo_experimental.

Nabokov86 commented 2 months ago

It seems like the reprocessing occurs after a new line is generated. Screenshot from 2024-04-23

LostRuins commented 2 months ago

Did you by any chance enable "Trim Sentences" or "Author Note"?

Nabokov86 commented 2 months ago

No, I use default settings without trimming. So, you can't reproduce it? saved_story.json

LostRuins commented 2 months ago

Yes, I can reproduce it. Looking closer, the tokenizer is behaving weirdly. I think there is an issue with token merges.

Relevant: https://github.com/ggerganov/llama.cpp/issues/6809

You should experience a small amount of reprocessing all the way back to the previous newline. This is a bug.

LostRuins commented 2 months ago

Hi, Should be fixed in the latest version. Remember to get freshly reconverted GGUFs

Nabokov86 commented 2 months ago

@LostRuins Thanks! Yes, it looks like it’s working now. Thank you for continuing to maintain this project, you’re awesome!