Open axel578 opened 1 week ago
Update your pytorch to at least 2.3 and your nvidia drivers to the latest.
Update your pytorch to at least 2.3 and your nvidia drivers to the latest.
I update to 2.3.1 and latest driver and the exact same issue occur with very high vram usage (8Gb for 64 rank lora) and control net extremely slow generation.
Are you sure? try downloading the latest standalone package from the readme.
Downloaded the standalone package, updated everything, and now its 34.99s/it... previously it was 1.24it/s. Mine system has 4070ti, 64gb DDR5 ram, core i7-14700k. How do I revert back to previous version?
Also noticed that loading more than one lora file will increase the generation time by 10 to 15 seconds per iteration.
2080ti 11gb windows made sure pytorch and nvidia were updated reinstalled from readme and same result as my updated install
I've had this issue for a little while now. I wish I knew what update changed it, but didn't keep up with it (comfy version number isn't in plain sight) If I had to guess i'd say it was within the last 3 updates. Didn't happen with this last one and didn't happen with the one before it. Was before that. Same story as the rest I had no problems generate images with flux and a lora, but now having 1 lora kills it. Roughly 14m for a single image. It does work, just very very slow.
I looked at the issues and saw it was reported 4 or 5 days ago so just been patient. As a dev myself I recognize this "Are you sure?" So just letting you know there is others that have followed your steps and this odd issue persists.
I'm experiencing the same issue. The official Controlnet workflow runs fine with some VRAM to spare. However, as soon as I add an 18M Lora to the workflow, the VRAM immediately explodes.
Allocation on device 0 would exceed allowed memory. (out of memory) Currently allocated : 22.47 GiB Requested : 72.00 MiB Device limit : 23.99 GiB Free (according to CUDA): 0 bytes PyTorch limit (set by user-supplied memory fraction) : 17179869184.00 GiB
Can you check if things have improved on the latest commit?
I have the same problem, I can't use two Lora at the same time, it slows down a lot with 4070 ti Super. With 1 Lora it is also slower than normal I'm using flux.dev16
Can you check if things have improved on the latest commit?
Its the exact same issue, on latest commit you did for fp8 lora, I used your 2.0 version, The exact same issue, maybe even a little slower.
Can you check if things have improved on the latest commit?
Its the exact same issue, on latest commit you did for fp8 lora, I used your 2.0 version, The exact same issue, maybe even a little slower.
The issue you're experiencing is related to shared memory. The best solution is to configure your GPU to not use VRAM as shared memory. If this isn't possible, you should use the --disable-smart-memory option to minimize VRAM usage. The next option to consider is the --reserve-memory option.
I was having horrendous slowdown issues with the previous portable release, sometimes with multiple minutes per iteration which made batch running impossible. However updating to the latest release v0.2.2 with the update to pytorch 124 has me back down to 2.6secs/iter.
7950x, 64GB DDR5, RTX 3080 10GB
Might fix others issues too?
Expected Behavior
No high VRAM usage, and no extreme slowness with controlnet
Actual Behavior
Technical details : latest version of comfy ui, 3090, pytorch 2.1 cuda 12.1, windows.
I currently use Comfy UI in production and this is really blocking because using multiple more than 32 rank lora on top of flux is extremely VRAM hungry, and using any control net with ControlNetApplyAdvanced or even the one for SD3/Hyuandit is extremely slow.
Comfy UI is currently not stable with my current configuration (windows is not a choice).
In case using GGUF doesnt help at all since speed is 1.8 times slower and control net support is not working for all models.
Steps to Reproduce
Technical details : latest version of comfy ui, 3090, pytorch 2.1 cuda 12.1, windows.
Just use any control net, or high rank lora stacked.
Debug Logs
Other
None