Closed Wendong-Fan closed 2 months ago
Study on efficiency improvement solutions and integrate necessary solutions
VLLM: A high-throughput and memory-efficient inference and serving engine for LLMs
No response
VLLM in #448
Required prerequisites
Motivation
Study on efficiency improvement solutions and integrate necessary solutions
Solution
VLLM: A high-throughput and memory-efficient inference and serving engine for LLMs
Alternatives
No response
Additional context
No response