city96 / ComfyUI-GGUF

GGUF Quantization support for native ComfyUI models
Apache License 2.0
966 stars 60 forks source link

Device selection (multi GPU) #96

Open tiko13 opened 2 months ago

tiko13 commented 2 months ago

Would it be possible to add a device selection into loaders, similar to to https://github.com/neuratech-ai/ComfyUI-MultiGPU

I have tried to 'port' that feature using similar code, but I have not been successful: even when device is correctly selected as cuda:0, for some reason it is attempting to load gguf models onto cuda:1

torch.cuda.set_device(device) also doesn't work, even when hardcoded,

There is probably something downstream overriding this selection, but I cannot figure out what it is

city96 commented 2 months ago

There is probably something downstream overriding this selection, but I cannot figure out what it is

That makes two of us. Doing CUDA_VISIBLE_DEVICES=1 just dumps the T5 model onto the CPU for some reason on my P40 though I haven't had time to investigate why this happens (This only happens with the gguf dual clip loader and not the normal one.)

Weirder still, my crappy bootleg offload script seems to work fine lol.

tiko13 commented 1 month ago

I will try to look at it more when I get time, yet another 3090 is headed my way and this is quickly becoming bigger of an issue :D

city96 commented 1 month ago

Question, can you check if https://github.com/city96/ComfyUI-GGUF/commit/d2aaeb0f138320cb2b1481d00c79ee63d7cfe81b does anything to fix this? You'll have to update ComfyUI too but it does fix the CPU issue at least. Might make it work with that node set you linked above.

(note: only really fixes the text encoder, I suspect running with highvram will still break things since the offload device will also be the GPU)

tiko13 commented 1 month ago

I have tried it now (no vram flag), for some reason, the UNET is now always loaded into cuda:1 device, even if i explicitly set it as cuda:0

this causes CLIP and VAE to run in cuda:0 but the UNET will always go into cuda:1 no matter what

city96 commented 1 month ago

Yeah, that was by last bet lol. Actually, have you tried running with CUDA_DEVICE_ORDER=PCI_BUS_ID ? I doubt it'd help but who knows, might be something with the device order.

tiko13 commented 1 month ago

So it's even more complicated than I thought: when freshly starting comfy and setting all nodes to cuda:0, it all goes fine on it. As soon as I load the encoder onto cuda:1, the unet is also forced there. BUT, when I change the clip and all other nodes back to cuda:0, CLIP and Vae load correctly, but the unet will reload onto cuda:1 anyway, and this can be fixed only by restarting comfy

amssss0 commented 3 weeks ago

+1