bd-iaas-us / vllm

A high-throughput and memory-efficient inference and serving engine for LLMs
https://docs.vllm.ai
Apache License 2.0
1 stars 0 forks source link

[Feature]: implementation of QLoRA on VLLM #9

Closed chenqianfzh closed 3 weeks ago

chenqianfzh commented 2 months ago

🚀 The feature, motivation and pitch

https://bytedance.larkoffice.com/wiki/ZKCQwGz7DiQbxtkh2lGc7A2hnVh

Alternatives

No response

Additional context

No response

chenqianfzh commented 3 weeks ago

https://github.com/vllm-project/vllm/pull/4776 merged