Closed localmind-ai closed 1 month ago
Hi @localmind-ai, this issue is from vllm, to run on multiple GPUs you need to run this command first:
export VLLM_WORKER_MULTIPROC_METHOD=spawn
You can get more detail from this: https://github.com/vllm-project/vllm/issues/6152
Ah, thanks a lot @khai-meetkai - will try that out!
First of all, thanks for the great new release of Functionary Medium 3.1 based on Llama 3.1 70B! Looking forward to try that one out.
Unfortunately, we have an issue when running the
server_vllm.py
we get some CUDA multiprocessing errors that don't appear in regular vLLM (tested on same server).We launch with
python3 server_vllm.py --model "meetkai/functionary-medium-v3.1" --host 0.0.0.0 --port 8080 --tensor-parallel-size 8
and this is the full error log when launching the script: