Medusa training and inference

⚠️ Please check that this feature request hasn't been suggested before.

[X] I searched previous Ideas in Discussions didn't find any similar feature requests.
[X] I searched previous Issues didn't find any similar feature requests.

🔖 Feature description

Medusa(https://github.com/FasterDecoding/Medusa/tree/main): A streamlined, user-centric framework designed to enhance LLM generation efficiency. Rather than incorporating an extra draft model as seen in speculative decoding, Medusa simply integrates a few more decoding heads, drawing inspiration from [Stern et al. 2018] and incorporating other elements. Despite its minimalist approach, Medusa can boost the generation efficiency of LLMs by approximately 2x. This project requires both training and inference, and given its potential, I believe it's worth implementing.

✔️ Solution

We need to implement Medusa, including the development of the medusa head, tree attention, and the Typical acceptance module. Additionally, a training configuration must be added.

❓ Alternatives

No response

📝 Additional Context

No response

Acknowledgements

[X] My issue title is concise, descriptive, and in title casing.
[X] I have searched the existing issues to make sure this feature has not been requested yet.
[X] I have provided enough information for the maintainers to understand and evaluate this request.

OpenAccess-AI-Collective / axolotl