comfyanonymous / ComfyUI

The most powerful and modular diffusion model GUI, api and backend with a graph/nodes interface.
https://www.comfy.org/
GNU General Public License v3.0
56.56k stars 5.99k forks source link

If I add a checkpoint loader node and output only its clip output, will the unused parts take RAM/VRAM? #4998

Open sipie800 opened 1 month ago

sipie800 commented 1 month ago

Your question

Say if I use checkpoint loader load a checkpoint which contains diffusion/clip/vae. And I port only the clip output to following node. Will the unused diffusion and vae part be loaded into momery and take the room even they are not working actually ?

Will you please explain a little more the resource strategy the checkpoint loader works in ?

Logs

No response

Other

No response

ltdrdata commented 1 month ago

The moment the Checkpoint Loader node is executed, all of its outputs are cached. This is a basic behavior of the node execution. If you only want to keep the CLIP, you should use the CLIP Loader instead.

In the current smart memory management structure, models are loaded into VRAM at the moment they are actually needed. When loading a new model into VRAM, if there's not enough space, existing models in VRAM are offloaded to RAM.

If you want to run a workflow with extremely reduced memory usage, you can structure it as follows: Load CLIP and use CLIPTextEncode to generate conditioning, then cache this in a node like Backend Cache (in Inspire Pack). After that, remove the CLIP loader and switch to a workflow that only performs diffusion using Retrieve Backend Cache and the diffusion model.

When the workflow transitions, the cache of the removed nodes is also released.

sipie800 commented 1 month ago

What do you mean cached ? Does that mean cached into VRAM ? It's confused as well that you say models are loaded into VRAM at the moment they are actually needed. What if some of the output is not needed ?

So far the 3rd-party hunyuandit loaders stop working due to recent comfyui updating. So there is no way we can load hunyuandit's clip alone, because hunyuandit's "CLIP" are actually dual text encoder of a roberta model and a mt5 model. Tested your dual clip loader, it's actually not compatible with the chinese-roberta-wwm-ext-large or mT5-xl.

Due to the failure of 3rd party nodes, I don't think they will keep up with these updating of comfyui. We need to count on better usage of official comfy nodes. Here is the nodes that stop working: https://github.com/Tencent/HunyuanDiT/tree/main/comfyui-hydit https://github.com/city96/ComfyUI_ExtraModels.git they used to work. If I downgrade comfyui. The new model like flux or cogvideo will stop working. So downgrading is not an option.

So for hunyuandit, we may need more compatible text encoder nodes to loader the chinese-roberta-wwm-ext-large and/or mT5-xl.

And I tested using checkpoint loader to load a all-in-one hunyuandit checkpoint and a diffusion loader to load an alternative dit checkpoint. It's working. And the VRAM seems to be around 9GB with or without the diffusion loader. So may I infer the dit part in checkpoint loader is not loaded into VRAM ? 2024-09-21_101404 What happens in such a config?

This is certainly not working, 2024-09-21_102903

github-actions[bot] commented 3 weeks ago

This issue is being marked stale because it has not had any activity for 30 days. Reply below within 7 days if your issue still isn't solved, and it will be left open. Otherwise, the issue will be closed automatically.

ltdrdata commented 3 weeks ago

By default, a node's execution output is cached in RAM, and models are only loaded from RAM to VRAM when GPU computation is actually needed. This is how the system maximizes the availability of limited VRAM.

The reason for caching node execution results in RAM is to prevent re-computation. For example, if loading a model takes 1 minute and the model isn't cached in RAM, you would waste an enormous amount of time loading from disk every time you need to use the model.