OpenAccess-AI-Collective / axolotl

Go ahead and axolotl questions
https://openaccess-ai-collective.github.io/axolotl/
Apache License 2.0
6.83k stars 749 forks source link

Medusa training and inference #557

Open dongxiaolong opened 9 months ago

dongxiaolong commented 9 months ago

⚠️ Please check that this feature request hasn't been suggested before.

🔖 Feature description

Medusa(https://github.com/FasterDecoding/Medusa/tree/main): A streamlined, user-centric framework designed to enhance LLM generation efficiency. Rather than incorporating an extra draft model as seen in speculative decoding, Medusa simply integrates a few more decoding heads, drawing inspiration from [Stern et al. 2018] and incorporating other elements. Despite its minimalist approach, Medusa can boost the generation efficiency of LLMs by approximately 2x. This project requires both training and inference, and given its potential, I believe it's worth implementing.

✔️ Solution

We need to implement Medusa, including the development of the medusa head, tree attention, and the Typical acceptance module. Additionally, a training configuration must be added.

❓ Alternatives

No response

📝 Additional Context

No response

Acknowledgements

ctlllll commented 9 months ago

I'm working on this right now :) Here is an initial commit for this integration: https://github.com/ctlllll/axolotl/commit/61fc938f431b59dc91d0327949dae48d9f65b053 https://github.com/ctlllll/axolotl/commit/0c97a58ec9ffa027285344f83599d670907cee64 We'll continue to develop this and make the pipeline smoother.