issues
search
bd-iaas-us
/
vllm
A high-throughput and memory-efficient inference and serving engine for LLMs
https://docs.vllm.ai
Apache License 2.0
1
stars
0
forks
source link
[Feature]: Proposal on new memory management based on vAttention
#16
Open
chizhang118
opened
1 week ago
chizhang118
commented
1 week ago
🚀 The feature, motivation and pitch
vAttention:
https://arxiv.org/pdf/2405.04437
Alternatives
No response
Additional context
No response
🚀 The feature, motivation and pitch
vAttention: https://arxiv.org/pdf/2405.04437
Alternatives
No response
Additional context
No response