bd-iaas-us / vllm

A high-throughput and memory-efficient inference and serving engine for LLMs
https://docs.vllm.ai
Apache License 2.0
1 stars 0 forks source link

[Feature]: Proposal on new memory management based on vAttention #16

Open chizhang118 opened 1 week ago

chizhang118 commented 1 week ago

🚀 The feature, motivation and pitch

vAttention: https://arxiv.org/pdf/2405.04437

Alternatives

No response

Additional context

No response