Open phymbert opened 2 months ago
@phymbert What would be the side-effects (or other objections/snags) of adding a SLOT_STATE_RESERVED
status to the two present slot states SLOT_STATE_IDLE
and SLOT_STATE_PROCESSING
that allowed some slots to be kept in reserve for new prompts or running chats so that new requests don't bump them? It struck me when I was playing with my slot graphics that this might be desirable and now it has emerged as an issue, so what do you think?
Context
At the moment we implement a FIFO approach to batch prompt tokens. So if a large prompt is to be processed it blocks all other slots.
Proposal: implement a fair batch usage of prompt processing accross all pending slots.
References: