InftyAI / llmaz

☸️ Easy, advanced inference platform for large language models on Kubernetes. 🌟 Star to support our work!
Apache License 2.0
30 stars 10 forks source link

Support speculative decoding #59

Closed kerthcet closed 2 months ago

kerthcet commented 3 months ago

What would you like to be added:

Speculative Decoding helps to accelerate the prediction of large language models. which is supported by vllm by default.

Why is this needed:

Improve the inference throughput.

Completion requirements:

This enhancement requires the following artifacts:

The artifacts should be linked in subsequent comments.

kerthcet commented 3 months ago

/kind feature /milestone v0.1.0

kerthcet commented 3 months ago

See https://github.com/vllm-project/vllm/issues/4630

kerthcet commented 2 months ago

/close

As supported in vllm.