bd-iaas-us vllm issues - Githubissues

bd-iaas-us / vllm

A high-throughput and memory-efficient inference and serving engine for LLMs

https://docs.vllm.ai

Apache License 2.0

1 stars 0 forks source link

issues

Newest

Newest Most commented Recently updated Oldest Least commented Least recently updated

[Misc]: Finding possible more interesting areas

#17 chizhang118 opened 1 week ago
3
[Feature]: Proposal on new memory management based on vAttention

#16 chizhang118 opened 1 week ago
0
Sparse KV cache

#15 chizhang118 closed 1 week ago
0
[Feature]: Support NF4 quant model such as lllyasviel/omost-llama-3-8b-4bits

#14 thesues opened 3 weeks ago
0
[Feature]: Support More Models in QLoRA of VLLM

#13 chenqianfzh opened 3 weeks ago
2
[Feature]: TP support in QLoRA of VLLM

#12 chenqianfzh opened 3 weeks ago
1
[Feature]: Sparse KV cache implementation

#11 chizhang118 opened 1 month ago
3
[Feature]: Longrope implmentation

#10 chizhang118 opened 1 month ago
0
[Feature]: implementation of QLoRA on VLLM

#9 chenqianfzh closed 3 weeks ago
1
GUFF support

#8 thesues opened 2 months ago
1
Longrope design

#7 chizhang118 closed 1 month ago
0
Clarification on Longrope, sparse KV cache and Infini-attention

#6 chizhang118 closed 1 month ago
1
Sparse KV cache design

#5 chizhang118 closed 1 month ago
3
Design doc for QLora feature

#4 chenqianfzh closed 2 months ago
1
Design doc for cpu offloading feature

#3 XiaoningDing opened 2 months ago
13
Q2 Roadmap

#2 XiaoningDing opened 2 months ago
0
LLM cpu_offload_weight PoC

#1 chenqianfzh opened 3 months ago
0