docker run --gpus 'all' 报错，多卡不支持吗

A40双卡服务器，使用GPU部署服务时 docker run --gpus 'all' --shm-size 1g -p 9090:80 -v $HOME/models:/data \ --env LOG_LEVEL="info,text_generation_router=debug" \ ghcr.nju.edu.cn/huggingface/text-generation-inference:1.0.3 \ --model-id /data/CodeShell-7B-Chat --num-shard 1 \ --max-total-tokens 5000 --max-input-length 4096 \ --max-stop-sequences 12 --trust-remote-code

报错如下： 024-01-19T08:15:44.995533Z ERROR warmup{max_input_length=4096 max_prefill_tokens=4096}:warmup: text_generation_client: router/client/src/lib.rs:33: Server error: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cuda:1! Error: Warmup(Generation("Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cuda:1!")) 2024-01-19T08:15:45.052858Z ERROR text_generation_launcher: Webserver Crashed 2024-01-19T08:15:45.052873Z INFO text_generation_launcher: Shutting down shards 2024-01-19T08:15:45.395141Z INFO shard-manager: text_generation_launcher: Shard terminated rank=0 Error: WebserverFailed

WisdomShell / codeshell-vscode

docker run --gpus 'all' 报错，多卡不支持吗 #53