bd-iaas-us / vllm

A high-throughput and memory-efficient inference and serving engine for LLMs
https://docs.vllm.ai
Apache License 2.0
1 stars 0 forks source link

Clarification on Longrope, sparse KV cache and Infini-attention #6

Closed chizhang118 closed 1 month ago

chizhang118 commented 2 months ago

Infini-attention: https://arxiv.org/abs/2404.07143

Longrope: https://arxiv.org/abs/2402.13753

Sparse KV cache: https://arxiv.org/abs/2306.14048

chizhang118 commented 2 months ago

https://bytedance.larkoffice.com/wiki/FseAwv4M0iSiWikSv3Ic3wdGnrg