LostRuins / koboldcpp

A simple one-file way to run various GGML and GGUF models with KoboldAI's UI
https://github.com/lostruins/koboldcpp
GNU Affero General Public License v3.0
4.35k stars 312 forks source link

Multi-task concurrency function is required, which is very important for API users. Similar to ollama, this function has been implemented #866

Open windkwbs opened 1 month ago

windkwbs commented 1 month ago

Multi-task concurrency function is required, which is very important for API users. Similar to ollama function has been implemented

LostRuins commented 1 month ago

Use --multiuser flag

windkwbs commented 1 month ago

--multiuser After checking, only queue execution is allowed and multiple tasks cannot be performed simultaneously

LostRuins commented 1 month ago

Yes, they will be executed in sequence. Koboldcpp does not allow parallel decoding.