Right now, it is not able to set these ENV for bs=1 and bs=128 in one warmup.
Proposed Change.
I am proposing to change VLLM_DECODE_BLOCK_BUCKET_* to VLLM_DECODE_SEQ_*.
VLLM_DECODE_SEQ_MIN: min seq length
VLLM_DECODE_SEQ_MAX: max seq length
VLLM_DECODE_SEQ_STEP: seq length as a step
When warm up graph, vLLM compute graph as:
(bs, total_block_number) = ( bs, bs x (VLLM_DECODE_SEQ_MIN + VLLM_DECODE_SEQ_STEP x N) / BLOCK_SIZE)
For bs=1 and bs=128, user can set the VLLM_DECODESEQ* as:
@kzawora-intel Please kindly provide your feedback. Thanks.
Any Other Things.
No response
Before submitting a new issue...
[X] Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.
Motivation.
In the current design, user cannot set VLLM_DECODE_BLOCK_BUCKET_MIN/MAX/STEP properly for small batch size and large batch size at the same time.
For example, considering requests with input_len 512 and max_output_len 1024, and batch size from 1 to 128.
For bs=1, user needs to set VLLM_DECODE_BLOCKBUCKET* to the min,max of one sequence.
For bs=128, user needs to set
Right now, it is not able to set these ENV for bs=1 and bs=128 in one warmup.
Proposed Change.
I am proposing to change
VLLM_DECODE_BLOCK_BUCKET_*
toVLLM_DECODE_SEQ_*
.VLLM_DECODE_SEQ_MIN:
min seq lengthVLLM_DECODE_SEQ_MAX:
max seq lengthVLLM_DECODE_SEQ_STEP:
seq length as a stepWhen warm up graph, vLLM compute graph as:
(bs, total_block_number) = ( bs, bs x (VLLM_DECODE_SEQ_MIN + VLLM_DECODE_SEQ_STEP x N) / BLOCK_SIZE)
For bs=1 and bs=128, user can set the VLLM_DECODESEQ* as:
Feedback Period.
No response
CC List.
@kzawora-intel Please kindly provide your feedback. Thanks.
Any Other Things.
No response
Before submitting a new issue...