FasterDecoding / Medusa

Medusa: Simple Framework for Accelerating LLM Generation with Multiple Decoding Heads
https://sites.google.com/view/medusa-llm
Apache License 2.0
2.28k stars 155 forks source link

Results for different configs #62

Closed zankner closed 11 months ago

zankner commented 12 months ago

Awesome project! I was wondering if you would be able to share the mt-bench results for the different Medusa configs. Specifically from this ablation: image

leeyeehoo commented 11 months ago

We are working on V1.0. So at that time, you will expect us to show more detailed information on how the ablation works (and maybe an Arxiv report).

zankner commented 11 months ago

Understood. If I am benchmarking for a paper should I assume that the settings reported in the repository, ie (mc_sim_7b_63), are the current optimal inference settings?

leeyeehoo commented 11 months ago

If you have your own model, and you want to generate a sparse tree, please refer to the preview version. The folder contains a readme that will guide you customize your tree settings step by step.

zankner commented 11 months ago

Understood thanks! Is there a timeline for v1 will be fully released?

leeyeehoo commented 11 months ago

It will be very soon... We still have some minor issues to try to fix :) pls stay tuned!

zankner commented 11 months ago

Are there any breaking changes? Ie if I have been basing my code off the main branch are there any bugs or issues with that branch? Sorry for all the questions

leeyeehoo commented 11 months ago

The eval folder is self-contained and should be compatible with the original one. We are trying to implement other recent models w full-finetuning and the branch is not tested yet...

zankner commented 11 months ago

Ah ok thank you very much! Great work btw!