Clarification on Longrope, sparse KV cache and Infini-attention

bd-iaas-us / vllm

A high-throughput and memory-efficient inference and serving engine for LLMs

https://docs.vllm.ai

Apache License 2.0

1 stars 0 forks source link

Closed chizhang118 closed 1 month ago

chizhang118 commented 2 months ago

chizhang118 commented 2 months ago