bd-iaas-us / vllm

A high-throughput and memory-efficient inference and serving engine for LLMs
https://docs.vllm.ai
Apache License 2.0
3 stars 1 forks source link

Sparse KV cache design #5

Closed chizhang118 closed 4 months ago

chizhang118 commented 4 months ago

Design doc - https://bytedance.larkoffice.com/wiki/CiW4wA3etijfWHkN8rWctNOQnxg

https://bytedance.larkoffice.com/wiki/ShXJwPtyEiy9pKkVr0EcurS0nSg

guojunzzc commented 4 months ago

Design doc - https://bytedance.larkoffice.com/wiki/CiW4wA3etijfWHkN8rWctNOQnxg

https://bytedance.larkoffice.com/wiki/ShXJwPtyEiy9pKkVr0EcurS0nSg

could you please open this design doc, which we don't have access, thanks

chizhang118 commented 4 months ago

Design doc - https://bytedance.larkoffice.com/wiki/CiW4wA3etijfWHkN8rWctNOQnxg https://bytedance.larkoffice.com/wiki/ShXJwPtyEiy9pKkVr0EcurS0nSg

could you please open this design doc, which we don't have access, thanks

https://docs.google.com/document/d/13_cpb31P9VOmPGa_tZ70s7z1vXGP_UenXf1WVuIppCk/edit#heading=h.evhjclx158o3 Thanks for your interest. The updated doc is pasted in the google doc.