Suggest including an efficient LLM inference work

AIoT-MLSys-Lab / Efficient-LLMs-Survey

[TMLR 2024] Efficient Large Language Models: A Survey

970 stars 82 forks source link

Suggest including an efficient LLM inference work #5

Closed ZexinLi0w0 closed 9 months ago

ZexinLi0w0 commented 9 months ago

Thanks for the great survey! I have a kind suggestion of including a discussion of this state-of-the-art work Medusa in the efficient LLM inference part.

Code repo: https://github.com/FasterDecoding/Medusa Blog website: https://sites.google.com/view/medusa-llm "Medusa: Simple Framework for Accelerating LLM Generation with Multiple Decoding Heads"

SUSTechBruce commented 9 months ago

Thanks for the suggestion ! We've added it to the github paperlist and will update it to survey in next version~