lmi cpu container with vLLM

deepjavalibrary / djl-serving

A universal scalable machine learning model deployment solution

Apache License 2.0

183 stars 58 forks source link

Open lanking520 opened 1 month ago

lanking520 commented 1 month ago

Support CPU container build for vLLM based LLM inference. Tested with LLAMA3-7B, worked, but extremely slow

engine=Python
option.rolling_batch=vllm
option.model_id=NousResearch/Hermes-2-Pro-Llama-3-8B
option.tensor_parallel_degree=1