Closed ArthurHoefer closed 1 year ago
I'd recommend making the queue layer as a separate system instead of directly in the API. Queues won't work well with the current synchronous API approach, as they will consume available connections, and possibly timeout while waiting (some browsers limit max connection timeout). A badly behaved client can also cause the queue to fill up with useless requests and become choked, forcing the entire server to need to restart.
Instead, I will add a new field to the /api/extra/perf
endpoint called idle
, you can check this value to determine if the server is currently free or busy. Your client can then queue the request locally, and only send it for generation once the server is available.
Good news @ArthurHoefer , I have implemented a queue mode for koboldcpp which will be available in the next release. It will work out of the box.
Implement a queue system in the api.
I'm currently attempting to build a project based on the kobolt api, only problem is if more than 1 person attempts to make an api request that person will automatically get rejected. Implementing a queue system in the api would allow the api to complete their request after finishing the request it's currently working on.