Closed jussitus closed 3 months ago
I can't also use it.
I don't OOM, but at the KSampler stage, GPU activity spikes to 100%, even before the first step, and it just gets stuck.
Tried with --reserve-vram 1.2
What is your workflow and model files?
This issue is confirmed.
Just applying InstantX canny + flux-dev-fp8
causing OOM.
(but CFG Guider is used instead of BasicGuider)
https://huggingface.co/InstantX/FLUX.1-dev-Controlnet-Canny/tree/main
I tried the Depth one, from the latest commit ( https://github.com/comfyanonymous/ComfyUI/commit/ea3f39bd6906dd455c867198d4d94152e76ad074 ) and it works with GGUF models. https://huggingface.co/Shakker-Labs/FLUX.1-dev-ControlNet-Depth
By the way, the Depth ControlNet has a strange behavior that we can only see in the console.
It starts to generate, but then it stops before the first step.
Then it loads some more models, and then finally generates completing all steps.
But it seems to work fine, anyways.
got prompt
Requested to load ControlNetFlux
Loading 1 new model
loaded partially 2311.0572191467286 2310.966796875 0
loaded partially 3378.4771410217286 3378.38671875 0
0%| | 0/12 [00:00<?, ?it/s]Requested to load AutoencodingEngine
Loading 1 new model
loaded completely 0.0 319.7467155456543 True
loaded partially 5566.4527809143065 5565.73828125 0
loaded partially 342.82582778930646 342.4921875 0
100%|██████████████████████████████████████████████████████████████████████████████████| 12/12 [01:35<00:00, 7.93s/it]
Requested to load AutoencodingEngine
Loading 1 new model
loaded completely 0.0 319.7467155456543 True
Prompt executed in 111.03 seconds
With more testing, I noticed other issues with ControlNet Depth:
After a few generations, times get much slower ( about x2 )
And sometimes I get this error :
!!! Exception during processing !!! Expected all tensors to be on the same device, but found at least two devices, cpu and cuda:0! (when checking argument for argument mat1 in method wrapper_CUDA_addmm)
The flux controlnet OOM should be fixed for most people now.
For me the Canny one improved somewhat, but it still does not work.
Now I can generate the first step successfully, but then GPU activity goes to 100% and it slows down.
I have 8GB VRAM and used --reserve-vram 1.2
Tried with a GGUF model ( Q4_K_S )
EDIT: a few more tests. It works !
--reserve-vram 1.6
- still not enough, but better, I can get 3 or 4 steps done. VRAM usage was at 7.8 GB.
--reserve-vram 1.8
- It works. VRAM usage reached 7.8 GB ( at about step 14 of 20 ), but it was fast, all the way.
--reserve-vram 2.0
- It works. VRAM usage is only at 7.1 GB.
--reserve-vram 2.4
- It works, with 2 loras.
Issue with OOM was fixed by https://github.com/comfyanonymous/ComfyUI/commit/b643eae08b7f0c8eb69b77bd61e31009bfb325b9
Expected Behavior
.
Actual Behavior
Using the InstantX Canny controlnet fills up my VRAM at the KSampler step and freezes my system (Fedora, running Comfy in a podman container). I can prevent the crash by using
--reserve-vram 4.0
(the InstantX cnet is ~3GB)Steps to Reproduce
.
Debug Logs
Other
No response