Open sirmo opened 5 months ago
No, KoboldCpp does not support batching. However, it does support queueing, which is automatically enabled when launching with the GUI launcher. For terminal users, adding --multiuser
will allow this feature. You can then send multiple parallel requests, and they will be queued and completed as needed.
I am not able to find much on batching support. But it appears that the downstream llama.cpp supports it.
https://github.com/ggerganov/llama.cpp/issues/4372
Any plans to expose this feature in koboldcpp?
I really like using koboldcpp and for speeding up scripting it would be quite nice.