Closed ZexinLi0w0 closed 9 months ago
Thanks for the great survey! I have a kind suggestion of including a discussion of this state-of-the-art work Medusa in the efficient LLM inference part.
Code repo: https://github.com/FasterDecoding/Medusa Blog website: https://sites.google.com/view/medusa-llm "Medusa: Simple Framework for Accelerating LLM Generation with Multiple Decoding Heads"
Thanks for the suggestion ! We've added it to the github paperlist and will update it to survey in next version~
Thanks for the great survey! I have a kind suggestion of including a discussion of this state-of-the-art work Medusa in the efficient LLM inference part.
Code repo: https://github.com/FasterDecoding/Medusa Blog website: https://sites.google.com/view/medusa-llm "Medusa: Simple Framework for Accelerating LLM Generation with Multiple Decoding Heads"