InternLM / lmdeploy

LMDeploy is a toolkit for compressing, deploying, and serving LLMs.
https://lmdeploy.readthedocs.io/en/latest/
Apache License 2.0
4.33k stars 390 forks source link

[Feature] blazing great work about KV Cache: Mooncake #1884

Open zhyncs opened 3 months ago

zhyncs commented 3 months ago

Motivation

TLDR This system has undergone large-scale deployment and validation in the kimi. It has great reference value.

repo: https://github.com/kvcache-ai/Mooncake

tech report: https://github.com/kvcache-ai/Mooncake/blob/main/Mooncake-v1.pdf

zhihu: https://zhuanlan.zhihu.com/p/705754254 https://zhuanlan.zhihu.com/p/705910725

cc @lzhangzz @grimoire @lvhan028

Related resources

No response

Additional context

No response

zhyncs commented 3 months ago

ref: https://zhuanlan.zhihu.com/p/706097807

zhyncs commented 3 months ago

ref: https://zhuanlan.zhihu.com/p/706097807

A detailed explanatory article by @feifeibear

lvhan028 commented 3 months ago

@irexyc Let's conduct a deep dive into this great work

zhyncs commented 3 months ago

Microsoft released a blog post about SplitWise in January 2024 https://www.microsoft.com/en-us/research/blog/splitwise-improves-gpu-usage-by-splitting-llm-inference-phases/ similar with Mooncake