Closed Bratzmeister closed 22 hours ago
I did change the code myself as suggested above. However there seems to be an issue with my dual-GPU setup as I receive a segfault after the model is loaded and inference is supposed to start. Maybe someone with a similar setup (integrated graphics might work too but I don't have any that's supported by ROCm) can test this.
Expected Behavior
when using e.g. --cuda-device 1 I want ComfyUI to use device with ID 1
Actual Behavior
it doesn't matter what I set in --cuda-device, ComfyUI will use device with ID 0 in any case
Steps to Reproduce
Debug Logs
Other
For reference I have two AMD RX 7900 XT in my system and when using --cuda-device 1 it's supposed to only expose the 2nd GPU to torch/cuda but since the switch is not working due to the environment variable being ignored by pytorch+rocm it will use the default cuda device which is cuda0 i.e. my 1st GPU not the 2nd. So below is an example showcasing that the correct environment variable yields the expected result.
example setting
CUDA_VISIBLE_DEVICES
to 1 as implemented by --cuda-device 1 in https://github.com/comfyanonymous/ComfyUI/blob/2d28b0b4790e3f6c2287be49d9872419eadfe5bb/main.py#L73so however when setting
HIP_VISIBLE_DEVICES
instead it will actually workSadly the pytorch and ROCm documentation is a bit misleading in this regard and one would assume that the two env vars are exchangeable but apparently despite the assumption in ROCm documentation (see links below) it's not the case for pytorch
https://pytorch.org/docs/stable/notes/hip.html https://rocm.docs.amd.com/en/latest/conceptual/gpu-isolation.html#cuda-visible-devices