Open lanking520 opened 1 month ago
Support CPU container build for vLLM based LLM inference. Tested with LLAMA3-7B, worked, but extremely slow
engine=Python option.rolling_batch=vllm option.model_id=NousResearch/Hermes-2-Pro-Llama-3-8B option.tensor_parallel_degree=1
Description
Support CPU container build for vLLM based LLM inference. Tested with LLAMA3-7B, worked, but extremely slow