About reproduction baseline results

dydrkfl06 commented 8 months ago

Thank for sharing your great works!

We are doing reproduction of your method for research purpose and found that the Medusa inference for baseline is also reported in your blog. We tried to check the speed of both EAGLE and Medusa methods with Llama2 70B Chat, but I guess official repo of Medusa doesn’t support Llama2 architecture at inference(maybe Medusa KV cache doesn’t match with Llama2). It would be thankful if you can provide your Medusa inference code with Llama2 70B chat so that we can cross-check EAGLE has far better acceleration on baseline models.

Thanks for reading.

hongyanz commented 8 months ago

We didn't report Medusa's inference result on Llama2. Medusa's inference result on Vicuna was just copied from Medusa's own technical report. You can ask Medusa's authors for their support.

dydrkfl06 commented 8 months ago

Sorry for misunderstanding. I'll ask Medusa's authors as you advised. Thanks!

SafeAILab / EAGLE

About reproduction baseline results #27