run with vllm - Githubissues

SafeAILab / EAGLE

Official Implementation of EAGLE

https://arxiv.org/pdf/2406.16858

Apache License 2.0

622 stars 59 forks source link

run with vllm #56

Closed Hevans123 closed 3 months ago

Hevans123 commented 3 months ago

Thanks for this great repo. I would like to know how is the progress to support with VLLM. Or could you point me what major changes that need to be done. That would be very helpful.

Liyuhui-12 commented 3 months ago

The draft model of EAGLE is essentially a Decoder layer of LLaMA, so two models should be created using the backend of vLLM, and the base model should be modified to output hidden states. The hidden states and input_ids should be concatenated, and then input into the draft model after dimension reduction. The most important thing to note is to modify the block table of PagedAttention to truncate rejected tokens.