Open zhyncs opened 3 months ago
A detailed explanatory article by @feifeibear
@irexyc Let's conduct a deep dive into this great work
Microsoft released a blog post about SplitWise in January 2024 https://www.microsoft.com/en-us/research/blog/splitwise-improves-gpu-usage-by-splitting-llm-inference-phases/ similar with Mooncake
Motivation
TLDR This system has undergone large-scale deployment and validation in the kimi. It has great reference value.
repo: https://github.com/kvcache-ai/Mooncake
tech report: https://github.com/kvcache-ai/Mooncake/blob/main/Mooncake-v1.pdf
zhihu: https://zhuanlan.zhihu.com/p/705754254 https://zhuanlan.zhihu.com/p/705910725
cc @lzhangzz @grimoire @lvhan028
Related resources
No response
Additional context
No response