deepjavalibrary / djl-serving

A universal scalable machine learning model deployment solution
Apache License 2.0
183 stars 58 forks source link

lmi cpu container with vLLM #2009

Open lanking520 opened 1 month ago

lanking520 commented 1 month ago

Description

Support CPU container build for vLLM based LLM inference. Tested with LLAMA3-7B, worked, but extremely slow

engine=Python
option.rolling_batch=vllm
option.model_id=NousResearch/Hermes-2-Pro-Llama-3-8B
option.tensor_parallel_degree=1