Closed RandomGitUser321 closed 3 months ago
And if you change the prompt text, the "out of memory" error will definitely appear.
And if you change the prompt text, the "out of memory" error will definitely appear.
Yeah, that's because it would normally try to offload the model to sysmem, then shuffle the t5 back into vram to generate the new prompt, then offload the t5 and reload the model, then sample. But since BnB is anchoring it, you'll probably OOM.
And if you change the prompt text, the "out of memory" error will definitely appear.
Yeah, that's because it would normally try to offload the model to sysmem, then shuffle the t5 back into vram to generate the new prompt, then offload the t5 and reload the model, then sample. But since BnB is anchoring it, you'll probably OOM.
i have 12 GB of VRAM, the model (flux1-schnell-bnb-nf4.safetensors) is 11.2 GB, barely fits the VRAM (I'm making sure my system is very barebone and clean).
the first generation is working fast using VRAM, but the second prompt ComfyUI starts using lowvram mode, that loads unloads some models to RAM that makes it very slow.
is it any way to make it stays on VRAM?
or if the problem is T5 Clip? is there any way to fix this?
This should be fixed with the latest commit.
This should be fixed with the latest commit.
I confirm. Now there is no OOM. Thanks.
This should be fixed with the latest commit.
Fix doesn't work for me. After first generation it executes text encoder on the cpu.
got prompt
model weight dtype torch.bfloat16, manual cast: None
model_type FLUX
Using xformers attention in VAE
Using xformers attention in VAE
Requested to load FluxClipModel_
Loading 1 new model
C:\AI\ComfyUI_windows_portable\ComfyUI\comfy\ldm\modules\attention.py:407: UserWarning: 1Torch was not compiled with flash attention. (Triggered internally at ..\aten\src\ATen\native\transformers\cuda\sdp_utils.cpp:455.)
out = torch.nn.functional.scaled_dot_product_attention(q, k, v, attn_mask=mask, dropout_p=0.0, is_causal=False)
Requested to load Flux
Loading 1 new model
100%|████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:02<00:00, 2.62s/it]
Requested to load AutoencodingEngine
Loading 1 new model
Prompt executed in 14.66 seconds
got prompt
loaded in lowvram mode 3991.193339538574
loaded completely 6049.751877021789 5859.856831550598
100%|████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:02<00:00, 2.57s/it]
Requested to load AutoencodingEngine
Loading 1 new model
Prompt executed in 33.91 seconds
This should be fixed with the latest commit.
Fix doesn't work for me. After first generation it executes text encoder on the cpu.
got prompt model weight dtype torch.bfloat16, manual cast: None model_type FLUX Using xformers attention in VAE Using xformers attention in VAE Requested to load FluxClipModel_ Loading 1 new model C:\AI\ComfyUI_windows_portable\ComfyUI\comfy\ldm\modules\attention.py:407: UserWarning: 1Torch was not compiled with flash attention. (Triggered internally at ..\aten\src\ATen\native\transformers\cuda\sdp_utils.cpp:455.) out = torch.nn.functional.scaled_dot_product_attention(q, k, v, attn_mask=mask, dropout_p=0.0, is_causal=False) Requested to load Flux Loading 1 new model 100%|████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:02<00:00, 2.62s/it] Requested to load AutoencodingEngine Loading 1 new model Prompt executed in 14.66 seconds got prompt loaded in lowvram mode 3991.193339538574 loaded completely 6049.751877021789 5859.856831550598 100%|████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:02<00:00, 2.57s/it] Requested to load AutoencodingEngine Loading 1 new model Prompt executed in 33.91 seconds
As long as the prompt is not changed, the next seed generation will not bother the T5 Clip. That makes first generation always 2x faster than after prompt changed.
For me, and possibly for you as well, when the prompt is changed, the T5 Clip slows everything down and makes Comfy run in LOW VRAM mode.
I don't know why, but that's what I've been observing.
is there any explanation for this, and how to prevent this?
I use the Schnell NF4, RTX 3060 12GB VRAM, on PC 32 GB RAM.
I found some workaround to prevent those slowdown after changing the prompt,
it's with click Unload Model from Comfy Manager.
profit! it's much faster this way!
loading the models again is not that slow, it's faster than waiting for the Clip process / lowvram mode.
it's fast as it should be:
compared to 45 seconds per image (prompt change without unloads model). that is 2.5X slower. unloading models only adds 2-3 seconds to the total generation time.
RTX 3060 12GB, RAM 32GB.
Can I see your exact workflow?
Can I see your exact workflow?
sure.. here, thank you @comfyanonymous for your help and attention.
this problem shown in video: https://youtu.be/2JaADaPbHOI
And both the node and comfyui are updated to the latest version?
And both the node and comfyui are updated to the latest version?
ComfyUI: e9589d6d9246d1ce5a810be1507ead39fff50e04 (17 hours ago) Said Node: f1935bd901860d4c1401dde5106f4c9543735ce8 ( 5 hours ago)
I see there's update in ComfyUI: Support loading directly to vram with CLIPLoader node.
I'll check it, and give an update.
I also notice a bit of a speed gain in the total prompt timing after clicking on the clear model button
And both the node and comfyui are updated to the latest version?
I see there's update in ComfyUI: Support loading directly to vram with CLIPLoader node.
I'll check it, and give an update.
updated to the latest git pull both comfy and the node, still have this problem.
26 seconds: initial load first generate 16 seconds: 2nd generate without changing prompt 47 seconds: 3rd generate with changing prompt 16 seconds: 4th generate without changing prompt 20 seconds: 5th generate with changing prompt + unload model first
in video: https://youtu.be/nmjhOKDp6VY
What Windows version and CPU are you guys using? This could be that annoying Windows 11 scheduler issue where it sometimes runs stuff on e-cores.
What Windows version and CPU are you guys using? This could be that annoying Windows 11 scheduler issue where it sometimes runs stuff on e-cores.
I'm using oldies Ryzen 5 3600, RTX 3060 12GB, RAM 32GB, OS: latest always updating Windows 11, currently 23H2 Build 22631.3958
Python 3.10.6, Comfy Manual (non portable), using virtualenv, torch 2.2.2+cu121.
Argument when running: .\venv\Scripts\python.exe -s main.py --disable-xformers --listen --port 8189
List of pip installed inside virtualenv: pip list.txt
Okay I gotta go searching that scheduler issue. I can try to run it in Ubuntu WSL2 though, if all fails.
Yeah I don't know much about AMD CPUs, but if they have some kind of equivalent to p-cores and e-cores, it could be a similar thing. Just throwing it out there as an idea, it may or may not be relevant.
I found some workaround to prevent those slowdown after changing the prompt,
it's with click Unload Model from Comfy Manager.
Found a slightly more convenient workaround, to use modified node from https://github.com/LarryJane491/ComfyUI-ModelUnloader to automatically unload models after the image is generated.
from comfy import model_management
class ModelUnloader:
@classmethod
def INPUT_TYPES(cls):
return {
"required": {
"image": ("IMAGE",),
},
"optional": {}
}
RETURN_TYPES = ("IMAGE",)
RETURN_NAMES = ("image_output",)
FUNCTION = "unload_model"
CATEGORY = "loaders"
def unload_model(self, image):
loadedmodels=model_management.current_loaded_models
unloaded_model = False
for i in range(len(loadedmodels) -1, -1, -1):
m = loadedmodels.pop(i)
m.model_unload()
del m
unloaded_model = True
if unloaded_model:
model_management.soft_empty_cache()
return (image,)
NODE_CLASS_MAPPINGS = {
"Model unloader": ModelUnloader,
}
I found some workaround to prevent those slowdown after changing the prompt, it's with click Unload Model from Comfy Manager.
Found a slightly more convenient workaround, to use modified node from https://github.com/LarryJane491/ComfyUI-ModelUnloader to automatically unload models after the image is generated.
@Ulexer the node wont connect to anything? how can I use it?
Edit: oh I see, I need to paste your code into that node (edit the modelunload.py)
NICE! It solved my problem—like, really on point!
Now I can use Flux Schnell NF4 and ComfyUI without any issues because the Wildcard node changes the prompt with every generation.
Without the Unload Node you edited, it significantly slows the process down, putting it in low VRAM mode.
Thank you! You should create a pull request for that node, though. Your edit made it work, so thank you! @Ulexer
I found some workaround to prevent those slowdown after changing the prompt, it's with click Unload Model from Comfy Manager.
Found a slightly more convenient workaround, to use modified node from https://github.com/LarryJane491/ComfyUI-ModelUnloader to automatically unload models after the image is generated.
from comfy import model_management class ModelUnloader: @classmethod def INPUT_TYPES(cls): return { "required": { "image": ("IMAGE",), }, "optional": {} } RETURN_TYPES = ("IMAGE",) RETURN_NAMES = ("image_output",) FUNCTION = "unload_model" CATEGORY = "loaders" def unload_model(self, image): loadedmodels=model_management.current_loaded_models unloaded_model = False for i in range(len(loadedmodels) -1, -1, -1): m = loadedmodels.pop(i) m.model_unload() del m unloaded_model = True if unloaded_model: model_management.soft_empty_cache() return (image,) NODE_CLASS_MAPPINGS = { "Model unloader": ModelUnloader, }
You node didn't work for me so i used the other unloader but the idea worked, so thankyou.
If you still have issues with latest ComfyUI can you run it with: --verbose and give me the full log?
@comfyanonymous Okay, I already update ComfyUI to the latest, and adds args --verbose
Here's the test:
With unloading method (from manager) and even with changing the prompt, the log is: Unloading AutoencodingEngine Unloading FluxClipModel Unloading Flux got prompt ... ... Prompt executed in 19.31 seconds <-- no lowvram, even changing the prompt_
For me it's 46 vs 19 seconds, I'd choose to unload models even it's not convinent.
Here's the workflow: Workflow-NF4-Schnell.json Here's the verbose logs: verbose-logs-20240813-comfyui.txt
Thanks it should actually be fixed now if you update ComfyUI.
Thanks it should actually be fixed now if you update ComfyUI.
woaaw that was fast, you are genius, many thanks.
generating images with changing the prompt, now takes 21 seconds only: Prompt executed in 21.11 seconds I saw logs about unloading the models when the prompt changed.
even better than workaround above (unload node), without changing the prompt, there no need to unload, and the speed can be still fast: Prompt executed in 15.83 seconds
thank you very much @comfyanonymous , you are the best!
As far as I know, a BNB'd model will anchor in the VRAM and can't easily be moved back to system memory. I have 8gb vram and even after sampling, it will stay mostly full and keep triggering the VAE decode to try to use tiled mode, due to not having enough VRAM.
It doesn't seem to free the vram, even after deleting the checkpointloadernf4 workflow and then doing something like making a new workflow using an sdxl model, for instance, the VRAM will still be full. If I'm not mistaken, I think you have to just delete the whole object that's anchored in the VRAM, but maybe keep a copy in system memory so that it can reload quick next time (ram->vram instead of from the drive)?