Open sipie800 opened 1 month ago
The moment the Checkpoint Loader
node is executed, all of its outputs are cached. This is a basic behavior of the node execution.
If you only want to keep the CLIP, you should use the CLIP Loader instead.
In the current smart memory management structure, models are loaded into VRAM at the moment they are actually needed. When loading a new model into VRAM, if there's not enough space, existing models in VRAM are offloaded to RAM.
If you want to run a workflow with extremely reduced memory usage, you can structure it as follows: Load CLIP and use CLIPTextEncode to generate conditioning, then cache this in a node like Backend Cache (in Inspire Pack). After that, remove the CLIP loader and switch to a workflow that only performs diffusion using Retrieve Backend Cache and the diffusion model.
When the workflow transitions, the cache of the removed nodes is also released.
What do you mean cached ? Does that mean cached into VRAM ? It's confused as well that you say models are loaded into VRAM at the moment they are actually needed. What if some of the output is not needed ?
So far the 3rd-party hunyuandit loaders stop working due to recent comfyui updating. So there is no way we can load hunyuandit's clip alone, because hunyuandit's "CLIP" are actually dual text encoder of a roberta model and a mt5 model. Tested your dual clip loader, it's actually not compatible with the chinese-roberta-wwm-ext-large or mT5-xl.
Due to the failure of 3rd party nodes, I don't think they will keep up with these updating of comfyui. We need to count on better usage of official comfy nodes. Here is the nodes that stop working: https://github.com/Tencent/HunyuanDiT/tree/main/comfyui-hydit https://github.com/city96/ComfyUI_ExtraModels.git they used to work. If I downgrade comfyui. The new model like flux or cogvideo will stop working. So downgrading is not an option.
So for hunyuandit, we may need more compatible text encoder nodes to loader the chinese-roberta-wwm-ext-large and/or mT5-xl.
And I tested using checkpoint loader to load a all-in-one hunyuandit checkpoint and a diffusion loader to load an alternative dit checkpoint. It's working. And the VRAM seems to be around 9GB with or without the diffusion loader. So may I infer the dit part in checkpoint loader is not loaded into VRAM ? What happens in such a config?
This is certainly not working,
This issue is being marked stale because it has not had any activity for 30 days. Reply below within 7 days if your issue still isn't solved, and it will be left open. Otherwise, the issue will be closed automatically.
By default, a node's execution output is cached in RAM, and models are only loaded from RAM to VRAM when GPU computation is actually needed. This is how the system maximizes the availability of limited VRAM.
The reason for caching node execution results in RAM is to prevent re-computation. For example, if loading a model takes 1 minute and the model isn't cached in RAM, you would waste an enormous amount of time loading from disk every time you need to use the model.
Your question
Say if I use checkpoint loader load a checkpoint which contains diffusion/clip/vae. And I port only the clip output to following node. Will the unused diffusion and vae part be loaded into momery and take the room even they are not working actually ?
Will you please explain a little more the resource strategy the checkpoint loader works in ?
Logs
No response
Other
No response