DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model
MIT License
3.68k
stars
156
forks
source link
Error executing method determine_num_available_blocks: vLLM multi node fails for both DeepSeek-Coder-V2-Instruct and DeepSeek-Coder-V2-Lite-Instruct #76
首先想问一下DeepSeek有没有试过在vLLM multi node上运行过? 我是通过ray在2个node x 8 GPUs V100上以half(float16)运行
这是运行参数:
CUDA_LAUNCH_BLOCKING=1 OMP_NUM_THREADS=1 vllm serve deepseek-ai/DeepSeek-Coder-V2-Instruct --tensor-parallel-size 16 --dtype half --trust-remote-code --enforce-eager --enable-chunked-prefill=False
DeepSeek-Coder-V2-Lite-Instruct也是在determine_num_available_blocks 处fails, 但是报一个NCCL error:
(RayWorkerWrapper pid=23558, ip=10.0.128.18) ERROR 07-28 13:53:40 worker_base.py:382] RuntimeError: NCCL Error 3: internal error - please report this issue to the NCCL developers