LostRuins / koboldcpp

A simple one-file way to run various GGML and GGUF models with KoboldAI's UI
https://github.com/lostruins/koboldcpp
GNU Affero General Public License v3.0
4.35k stars 312 forks source link

Wrong tokens / second #852

Closed EugeoSynthesisThirtyTwo closed 1 month ago

EugeoSynthesisThirtyTwo commented 1 month ago

It says

Processing Prompt [BLAS] (1676 / 1676 tokens)
Generating (78 / 387 tokens)
(EOS token triggered!)
(Special Stop Token Triggered! ID:128009)
CtxLimit: 1754/8192, Process:25.05s (14.9ms/T = 66.89T/s), Generate:59.05s (152.6ms/T = 6.55T/s), Total:84.11s (4.60T/s)

But 6.55T/s is the speed that would have been achieved if the model generated 387 tokens. The model actually generated only 78 tokens, so the real generation speed is 78 / 59.05 = 1.32 tokens / s

LostRuins commented 1 month ago

Will try to fix

LostRuins commented 1 month ago

Can you see if the latest version solves this issue?

EugeoSynthesisThirtyTwo commented 1 month ago

Can you see if the latest version solves this issue?

It's good thank you image

However, as you can see, if I abort the generation, there is a new log "Generating (301 / 300 tokens)" which is wrong. I don't know if it's related. Let me know if I should open a new issue for this.

LostRuins commented 1 month ago

Don't worry about that, probably just a minor thing.