Report a bug about "BLSA Batch Size"

LostRuins / koboldcpp

Run GGUF models easily with a KoboldAI UI. One File. Zero Install.

https://github.com/lostruins/koboldcpp

GNU Affero General Public License v3.0

4.66k stars 334 forks source link

Report a bug about "BLSA Batch Size" #693

Closed sssfhfhchasd closed 5 months ago

sssfhfhchasd commented 6 months ago

With version 1.54 and above, when loading "Qwen-14B-Chat.Q4_K_M.gguf" and "Causallm-14b-dpo-alpha.Q4_K_M.gguf", the model can only output normally in the first few sentences, and then only meaningless garbled characters will be output. Adjusting "BLSA Batch Size" to the lowest value can solve the problem, but it will make the reasoning speed very slow. This seems to be a bug. I hope it can be fixed if possible. Thank you very much.

nopperl commented 5 months ago

@sssfhfhchasd are you using --usecublas?

sssfhfhchasd commented 5 months ago

@nopperl Yes, my graphics card model is “NVDIA GeForce gtx 1050“. The default setting is "usecublas".