comfyanonymous / ComfyUI

The most powerful and modular diffusion model GUI, api and backend with a graph/nodes interface.
https://www.comfy.org/
GNU General Public License v3.0
51.42k stars 5.41k forks source link

If I add a checkpoint loader node and output only its clip output, will the unused parts take RAM/VRAM? #4998

Open sipie800 opened 3 days ago

sipie800 commented 3 days ago

Your question

Say if I use checkpoint loader load a checkpoint which contains diffusion/clip/vae. And I port only the clip output to following node. Will the unused diffusion and vae part be loaded into momery and take the room even they are not working actually ?

Will you please explain a little more the resource strategy the checkpoint loader works in ?

Logs

No response

Other

No response

ltdrdata commented 3 days ago

The moment the Checkpoint Loader node is executed, all of its outputs are cached. This is a basic behavior of the node execution. If you only want to keep the CLIP, you should use the CLIP Loader instead.

In the current smart memory management structure, models are loaded into VRAM at the moment they are actually needed. When loading a new model into VRAM, if there's not enough space, existing models in VRAM are offloaded to RAM.

If you want to run a workflow with extremely reduced memory usage, you can structure it as follows: Load CLIP and use CLIPTextEncode to generate conditioning, then cache this in a node like Backend Cache (in Inspire Pack). After that, remove the CLIP loader and switch to a workflow that only performs diffusion using Retrieve Backend Cache and the diffusion model.

When the workflow transitions, the cache of the removed nodes is also released.

sipie800 commented 3 days ago

What do you mean cached ? Does that mean cached into VRAM ? It's confused as well that you say models are loaded into VRAM at the moment they are actually needed. What if some of the output is not needed ?

So far the 3rd-party hunyuandit loaders stop working due to recent comfyui updating. So there is no way we can load hunyuandit's clip alone, because hunyuandit's "CLIP" are actually dual text encoder of a roberta model and a mt5 model. Tested your dual clip loader, it's actually not compatible with the chinese-roberta-wwm-ext-large or mT5-xl.

Due to the failure of 3rd party nodes, I don't think they will keep up with these updating of comfyui. We need to count on better usage of official comfy nodes. Here is the nodes that stop working: https://github.com/Tencent/HunyuanDiT/tree/main/comfyui-hydit https://github.com/city96/ComfyUI_ExtraModels.git they used to work. If I downgrade comfyui. The new model like flux or cogvideo will stop working. So downgrading is not an option.

So for hunyuandit, we may need more compatible text encoder nodes to loader the chinese-roberta-wwm-ext-large and/or mT5-xl.

And I tested using checkpoint loader to load a all-in-one hunyuandit checkpoint and a diffusion loader to load an alternative dit checkpoint. It's working. And the VRAM seems to be around 9GB with or without the diffusion loader. So may I infer the dit part in checkpoint loader is not loaded into VRAM ? 2024-09-21_101404 What happens in such a config?

This is certainly not working, 2024-09-21_102903