Open picobyte opened 1 year ago
Huh, interesting idea to round-robin the GPUs. Might even work on my cards. They don't reach shutdown temp but they do thermal throttle (P40s).
Anyway, I don't know if custom nodes get notified when a workflow gets cancelled, but I'll try to figure something out. I can only realistically mess with my multi-GPU setup on the weekend so I'll try to get back to you on this.
(There's a "rewrite" branch but I'm not sure that fixes either of your issues.)
For the temperature issue I'll try a workaround using temperature protection. If you're interested in the attempted workflow to switch gpus: multi_gpu_test.json (currently not working). Possibly this can be done better. I am just starting with ComfyUI.
However I also wonder what the benefit is of one workflow to control the gpus versus running the ComfyUI multiple times. I think it would be better if dedicated tasks are dispatched to distinct GPUs, like one GPU for adding noise, another for UNET, one for reconstruction and maybe one for preview image generation[1], or something like that. Alternatively: subsequent cycles run on distinct GPUs. I mean this just as my (naive) concept of the ideal distribution if work. Or maybe averaging(?) of parallel run cycles or so. [1] https://huggingface.co/blog/stable_diffusion
Okay, so I tried making a round-robin node to switch the URLs but //
is interpreted as a comment... I'll get back to this once I find out where the logic for it is in comfyui.
As for the cancel, I added some simple logic to clear it before starting a new job. Now, this isn't optimal since the job keeps running even after you cancel it. I guess I could break it out into a separate "cancel all jobs" node but it'd be much cleaner if there was a way for custom nodes to be notified when a workflow is canceled/interrupted. I already asked comfy so I guess we'll just have to wait for now.
(Sorry, the readme is still a mess, I'll try to clean it up and then I'll merge the rewrite branch into the main one if everything works.)
I have a Tesla M10, 4 gpus, passively cooled, overheating easily. At 95C the gpu becomes unusable until reboot. Controlling which gpu is activated and which one cools down via your extension, has several problems:
new_prompt[i]
being None.