SafeAILab / EAGLE

Official Implementation of EAGLE-1 (ICML'24) and EAGLE-2 (EMNLP'24)
https://arxiv.org/pdf/2406.16858
Apache License 2.0
780 stars 79 forks source link

VLLM contribution #52

Closed arunpatala closed 6 months ago

arunpatala commented 6 months ago

Thanks for this great repo. I would like to run EAGLE with VLLM. I would like to contribute to the VLLM implementation. If you are already working on a branch I would like to help. If not, if you can point me what changes need to be made. That would be most helpful.

Liyuhui-12 commented 6 months ago

Very welcome! The draft model of EAGLE is essentially a Decoder layer of LLaMA, so two models should be created using the backend of vLLM, and the base model should be modified to output hidden states. The hidden states and input_ids should be concatenated, and then input into the draft model after dimension reduction. The most important thing to note is to modify the block table of PagedAttention to truncate rejected tokens.

arunpatala commented 6 months ago

Thanks for the information. I will let you know how it goes.