issues
search
bd-iaas-us
/
vllm
A high-throughput and memory-efficient inference and serving engine for LLMs
https://docs.vllm.ai
Apache License 2.0
1
stars
0
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
[Misc]: Finding possible more interesting areas
#17
chizhang118
opened
1 week ago
3
[Feature]: Proposal on new memory management based on vAttention
#16
chizhang118
opened
1 week ago
0
Sparse KV cache
#15
chizhang118
closed
1 week ago
0
[Feature]: Support NF4 quant model such as lllyasviel/omost-llama-3-8b-4bits
#14
thesues
opened
3 weeks ago
0
[Feature]: Support More Models in QLoRA of VLLM
#13
chenqianfzh
opened
3 weeks ago
2
[Feature]: TP support in QLoRA of VLLM
#12
chenqianfzh
opened
3 weeks ago
1
[Feature]: Sparse KV cache implementation
#11
chizhang118
opened
1 month ago
3
[Feature]: Longrope implmentation
#10
chizhang118
opened
1 month ago
0
[Feature]: implementation of QLoRA on VLLM
#9
chenqianfzh
closed
3 weeks ago
1
GUFF support
#8
thesues
opened
2 months ago
1
Longrope design
#7
chizhang118
closed
1 month ago
0
Clarification on Longrope, sparse KV cache and Infini-attention
#6
chizhang118
closed
1 month ago
1
Sparse KV cache design
#5
chizhang118
closed
1 month ago
3
Design doc for QLora feature
#4
chenqianfzh
closed
2 months ago
1
Design doc for cpu offloading feature
#3
XiaoningDing
opened
2 months ago
13
Q2 Roadmap
#2
XiaoningDing
opened
2 months ago
0
LLM cpu_offload_weight PoC
#1
chenqianfzh
opened
3 months ago
0