Open dongxiaolong opened 1 year ago
I'm working on this right now :) Here is an initial commit for this integration: https://github.com/ctlllll/axolotl/commit/61fc938f431b59dc91d0327949dae48d9f65b053 https://github.com/ctlllll/axolotl/commit/0c97a58ec9ffa027285344f83599d670907cee64 We'll continue to develop this and make the pipeline smoother.
⚠️ Please check that this feature request hasn't been suggested before.
🔖 Feature description
Medusa(https://github.com/FasterDecoding/Medusa/tree/main): A streamlined, user-centric framework designed to enhance LLM generation efficiency. Rather than incorporating an extra draft model as seen in speculative decoding, Medusa simply integrates a few more decoding heads, drawing inspiration from [Stern et al. 2018] and incorporating other elements. Despite its minimalist approach, Medusa can boost the generation efficiency of LLMs by approximately 2x. This project requires both training and inference, and given its potential, I believe it's worth implementing.
✔️ Solution
We need to implement Medusa, including the development of the medusa head, tree attention, and the Typical acceptance module. Additionally, a training configuration must be added.
❓ Alternatives
No response
📝 Additional Context
No response
Acknowledgements