Closed kerthcet closed 2 months ago
What would you like to be added:
Speculative Decoding helps to accelerate the prediction of large language models. which is supported by vllm by default.
Why is this needed:
Improve the inference throughput.
Completion requirements:
This enhancement requires the following artifacts:
The artifacts should be linked in subsequent comments.
/kind feature /milestone v0.1.0
See https://github.com/vllm-project/vllm/issues/4630
/close
As supported in vllm.
What would you like to be added:
Speculative Decoding helps to accelerate the prediction of large language models. which is supported by vllm by default.
Why is this needed:
Improve the inference throughput.
Completion requirements:
This enhancement requires the following artifacts:
The artifacts should be linked in subsequent comments.