LostRuins / koboldcpp

Run GGUF models easily with a KoboldAI UI. One File. Zero Install.
https://github.com/lostruins/koboldcpp
GNU Affero General Public License v3.0
4.81k stars 343 forks source link

(Enhancement) queue system in api #373

Closed ArthurHoefer closed 1 year ago

ArthurHoefer commented 1 year ago

Implement a queue system in the api.

I'm currently attempting to build a project based on the kobolt api, only problem is if more than 1 person attempts to make an api request that person will automatically get rejected. Implementing a queue system in the api would allow the api to complete their request after finishing the request it's currently working on.

LostRuins commented 1 year ago

I'd recommend making the queue layer as a separate system instead of directly in the API. Queues won't work well with the current synchronous API approach, as they will consume available connections, and possibly timeout while waiting (some browsers limit max connection timeout). A badly behaved client can also cause the queue to fill up with useless requests and become choked, forcing the entire server to need to restart.

Instead, I will add a new field to the /api/extra/perf endpoint called idle, you can check this value to determine if the server is currently free or busy. Your client can then queue the request locally, and only send it for generation once the server is available.

LostRuins commented 12 months ago

Good news @ArthurHoefer , I have implemented a queue mode for koboldcpp which will be available in the next release. It will work out of the box.