LostRuins / koboldcpp

Run GGUF models easily with a KoboldAI UI. One File. Zero Install.
https://github.com/lostruins/koboldcpp
GNU Affero General Public License v3.0
4.98k stars 349 forks source link

Is continous batching supported? #798

Open sirmo opened 5 months ago

sirmo commented 5 months ago

I am not able to find much on batching support. But it appears that the downstream llama.cpp supports it.

https://github.com/ggerganov/llama.cpp/issues/4372

Any plans to expose this feature in koboldcpp?

I really like using koboldcpp and for speeding up scripting it would be quite nice.

LostRuins commented 5 months ago

No, KoboldCpp does not support batching. However, it does support queueing, which is automatically enabled when launching with the GUI launcher. For terminal users, adding --multiuser will allow this feature. You can then send multiple parallel requests, and they will be queued and completed as needed.