FasterDecoding / Medusa

Medusa: Simple Framework for Accelerating LLM Generation with Multiple Decoding Heads
https://sites.google.com/view/medusa-llm
Apache License 2.0
2.28k stars 155 forks source link

Update ROADMAP.md #21

Closed leeyeehoo closed 1 year ago

leeyeehoo commented 1 year ago

The tree sparsity initial experiments show some promising results. Will make it merged after careful evaluation.