Open CoolFish88 opened 1 month ago
We are planning a release that will include vllm 0.6.2 within the next 2 weeks. In the meantime, you can try providing a requirements.txt with vllm==0.6.x and leverage a later version of vllm that way. If you go this route, you should also set OPTION_ROLLING_BATCH=vllm
environment variable to force usage of vllm
Concise Description:
vLLM v0.6.0 provides 2.7x throughput improvement and 5x latency reduction over the previous version (v0.5.3)
DLC image/dockerfile: 763104351884.dkr.ecr.us-west-2.amazonaws.com/djl-inference:0.29.0-lmi11.0.0-cu124 763104351884.dkr.ecr.us-west-2.amazonaws.com/djl-inference:0.29.0-neuronx-sdk2.19.1
Is your feature request related to a problem? Please describe. Improve the performance of LMI containters
Describe the solution you'd like Update vLLM library in LMI containers to v0.6.0