comfyanonymous / ComfyUI

The most powerful and modular stable diffusion GUI, api and backend with a graph/nodes interface.
GNU General Public License v3.0
40.19k stars 4.28k forks source link

Node processing increases exponentially for large workflows #2451

Open whmc76 opened 5 months ago

whmc76 commented 5 months ago

I have to comfyui stuck at the beginning and it won't work unless I remove some steps, is there any way to see what's wrong? image

whmc76 commented 5 months ago

I try to add a new ksampler process to a more complex process, and everything gets stuck. If I bypass some of these steps, I can resume running, but I can't execute all the nodes, which is confusing to me.

I've tried, without limitation, replacing different sampler nodes, different seed generation methods, and reducing the loaded control modules, but the only thing that works is to block out one or two ipadapters or ksamplers, I don't know what's wrong with this

whmc76 commented 5 months ago

After more than ten minutes of waiting, I saw that the workflow was executing very slowly. It seemed that he was working with the CPU instead of the GPU. Removing some steps could restore normal speed...

got prompt Loads SAM model: E:\IMAGE\ComfyUI_windows_portable\ComfyUI\models\sams\sam_vit_b_01ec64.pth (device:Prefer GPU) model_type EPS adm 0 Using pytorch attention in VAE Working with z of shape (1, 4, 32, 32) = 4096 dimensions. Using pytorch attention in VAE missing {'cond_stage_model.clip_l.text_projection', 'cond_stage_model.clip_l.logit_scale'} left over keys: dict_keys(['model_ema.decay', 'model_ema.num_updates', 'cond_stage_model.clip_l.transformer.text_model.embeddings.position_ids']) loaded straight to GPU Requested to load BaseModel Loading 1 new model Requested to load SD1ClipModel Loading 1 new model Using pytorch attention in VAE Working with z of shape (1, 4, 32, 32) = 4096 dimensions. Using pytorch attention in VAE Leftover VAE keys ['model_ema.decay', 'model_ema.num_updates']

0: 640x480 1 face, 115.6ms Speed: 12.1ms preprocess, 115.6ms inference, 114.8ms postprocess per image at shape (1, 3, 640, 480)

0: 640x608 1 face, 107.6ms Speed: 3.0ms preprocess, 107.6ms inference, 2.0ms postprocess per image at shape (1, 3, 640, 608)

0: 640x608 1 hair, 6.0ms Speed: 3.0ms preprocess, 6.0ms inference, 1.0ms postprocess per image at shape (1, 3, 640, 608) WAS Node Suite: Face found with: haarcascade_frontalface_alt2.xml Applied providers: ['CPUExecutionProvider'], with options: {'CPUExecutionProvider': {}} find model: E:\IMAGE\ComfyUI_windows_portable\ComfyUI\models\insightface\models\buffalo_l\1k3d68.onnx landmark_3d_68 ['None', 3, 192, 192] 0.0 1.0 Applied providers: ['CPUExecutionProvider'], with options: {'CPUExecutionProvider': {}} find model: E:\IMAGE\ComfyUI_windows_portable\ComfyUI\models\insightface\models\buffalo_l\2d106det.onnx landmark_2d_106 ['None', 3, 192, 192] 0.0 1.0 Applied providers: ['CPUExecutionProvider'], with options: {'CPUExecutionProvider': {}} find model: E:\IMAGE\ComfyUI_windows_portable\ComfyUI\models\insightface\models\buffalo_l\det_10g.onnx detection [1, 3, '?', '?'] 127.5 128.0 Applied providers: ['CPUExecutionProvider'], with options: {'CPUExecutionProvider': {}} find model: E:\IMAGE\ComfyUI_windows_portable\ComfyUI\models\insightface\models\buffalo_l\genderage.onnx genderage ['None', 3, 96, 96] 0.0 1.0 Applied providers: ['CPUExecutionProvider'], with options: {'CPUExecutionProvider': {}} find model: E:\IMAGE\ComfyUI_windows_portable\ComfyUI\models\insightface\models\buffalo_l\w600k_r50.onnx recognition ['None', 3, 112, 112] 127.5 127.5 set det-size: (640, 640) INFO: the IPAdapter reference image is not a square, CLIPImageProcessor will resize and crop it at the center. If the main focus of the picture is not in the middle the result might not be what you are expecting. Requested to load CLIPVisionModelProjection Loading 1 new model Requested to load CLIPVisionModelProjection Loading 1 new model [] [] Requested to load AutoencoderKL Loading 1 new model Requested to load BaseModel Requested to load ControlNet Loading 2 new models unload clone 4 100%|██████████████████████████████████████████████████████████████████████████████████| 20/20 [00:02<00:00, 7.09it/s] [] [] set os.environ[OMP_NUM_THREADS] to 4 => loading pretrained model E:\IMAGE\ComfyUI_windows_portable\ComfyUI\custom_nodes\comfyui_controlnet_aux\ckpts\hr16/ControlNet-HandRefiner-pruned\hrnetv2_w64_imagenet_pretrained.pth INFO: Created TensorFlow Lite XNNPACK delegate for CPU. [] [] Requested to load BaseModel Requested to load ControlNet Requested to load ControlNet Loading 3 new models unload clone 2 100%|██████████████████████████████████████████████████████████████████████████████████| 20/20 [00:02<00:00, 7.82it/s] set os.environ[OMP_NUM_THREADS] to 4 => loading pretrained model E:\IMAGE\ComfyUI_windows_portable\ComfyUI\custom_nodes\comfyui_controlnet_aux\ckpts\hr16/ControlNet-HandRefiner-pruned\hrnetv2_w64_imagenet_pretrained.pth [] [] [] [] [] [] Requested to load BaseModel Requested to load ControlNet Requested to load ControlNet Requested to load ControlNet Loading 4 new models unload clone 3 100%|██████████████████████████████████████████████████████████████████████████████████| 20/20 [00:17<00:00, 1.12it/s]

0: 640x576 (no detections), 96.4ms Speed: 3.5ms preprocess, 96.4ms inference, 1.0ms postprocess per image at shape (1, 3, 640, 576) Prompt executed in 985.43 seconds

ltdrdata commented 5 months ago

This issue should be reported to here: https://github.com/chrisgoringe/cg-use-everywhere

whmc76 commented 5 months ago

This issue should be reported to here: https://github.com/chrisgoringe/cg-use-everywhere

same issue when i use ksampler's widget seed , It doesn't matter which seed node I use image

ltdrdata commented 5 months ago

First, it would be helpful to discuss the most simplified workflow where this issue occurs. At the moment, it's not clear what exactly your issue is. As possible, it would be best to diagnose the issue without using any custom nodes to properly identify the problem.

Looking at the screenshot, I got confused, thinking that it was stuck in the UE node.

whmc76 commented 5 months ago

First, it would be helpful to discuss the most simplified workflow where this issue occurs. At the moment, it's not clear what exactly your issue is. As possible, it would be best to diagnose the issue without using any custom nodes to properly identify the problem.首先,讨论发生此问题的最简化的工作流程会很有帮助。目前,尚不清楚您的问题到底是什么。最好在不使用任何自定义节点的情况下诊断问题,以正确识别问题。

Looking at the screenshot, I got confused, thinking that it was stuck in the UE node.看着截图,我一头雾水,以为卡在UE节点上了。

I know it's weird, and I'll try to simplify the scenario where this problem occurs later, but I can get it back up and running by bypassing a ksampler, so I feel like it should have something to do with it

Just switched to the simple ksampler method to run, and I gave up after getting stuck for over 600 seconds image

ltdrdata commented 5 months ago

Sometimes, issues arise when custom nodes cause problems internally, even if the nodes are not actively being used. Try checking for the issue with custom nodes disabled as much as possible.

whmc76 commented 5 months ago

Sometimes, issues arise when custom nodes cause problems internally, even if the nodes are not actively being used. Try checking for the issue with custom nodes disabled as much as possible.

stuck issue sample.json I made a simple workflow to replicate this problem. He just used a few third-party nodes. Normally, each part can be executed quickly, but now it executes together like a snail. Please help me solve this problem, which prevents most of my complex workflows from being executed efficiently.

ltdrdata commented 5 months ago

Sometimes, issues arise when custom nodes cause problems internally, even if the nodes are not actively being used. Try checking for the issue with custom nodes disabled as much as possible.

stuck issue sample.json I made a simple workflow to replicate this problem. He just used a few third-party nodes. Normally, each part can be executed quickly, but now it executes together like a snail. Please help me solve this problem, which prevents most of my complex workflows from being executed efficiently.

Is this really simplified workflow?

poisenbery commented 5 months ago

Is this really simplified workflow?

In my case, the problem only occurs once the node tree reaches a certain size. Basically, everything works fine until you reach a certain point in the workflow size. If you reach that point, and start adding more nodes, ComfyUI will spend an exponentially longer time on "Got Prompt." The time becomes even longer if you add additional nodes (IE, you can add an additional 10 minutes to "Got Prompt" processing time if you only add 2 additional nodes).

I will attempt to re-create the issue with vanilla comfyUI nodes later today but it will take some time. But I have also been experiencing very strange slowdowns once my node tree exceeds a certain size. Initial gen: image

After first gen: image

NGL, it would be nice if we could just FORCE comfyUI to run the entire tree without having to check all the inputs. I experimented with ComfyUI-to-Python script and the scripts were able to run without the slowdown issues.

Looking at the screenshot, I got confused, thinking that it was stuck in the UE node.

This node was actually a major offender and massive contribute to the bug, but not the cause. Removing UE allowed me to process the workflow at a decent speed, but once I started adding more nodes, I eventually reached that "point" again, and the slowdown's resumed as they normally did.

ltdrdata commented 5 months ago

During the workflow, check how much VRAM and RAM are being utilized. It's possible that swapping is occurring due to exceeding the RAM capacity.

poisenbery commented 5 months ago

VRAM and RAM

I'm running on an RTX 3090Ti 24GB Vram, and 64GB physical ram. I checked system processes during the "got prompt" stage: Vram is 966MB free -m shows that swap is not being used. RAM usage for Python is ~8GB with momentary spikes at !0-!2 GB

whmc76 commented 5 months ago

Is this really simplified workflow?

In my case, the problem only occurs once the node tree reaches a certain size. Basically, everything works fine until you reach a certain point in the workflow size. If you reach that point, and start adding more nodes, ComfyUI will spend an exponentially longer time on "Got Prompt." The time becomes even longer if you add additional nodes (IE, you can add an additional 10 minutes to "Got Prompt" processing time if you only add 2 additional nodes).

I will attempt to re-create the issue with vanilla comfyUI nodes later today but it will take some time. But I have also been experiencing very strange slowdowns once my node tree exceeds a certain size. Initial gen: image

After first gen: image

NGL, it would be nice if we could just FORCE comfyUI to run the entire tree without having to check all the inputs. I experimented with ComfyUI-to-Python script and the scripts were able to run without the slowdown issues.

Looking at the screenshot, I got confused, thinking that it was stuck in the UE node.

This node was actually a major offender and massive contribute to the bug, but not the cause. Removing UE allowed me to process the workflow at a decent speed, but once I started adding more nodes, I eventually reached that "point" again, and the slowdown's resumed as they normally did.

My situation is exactly the same as yours, very low cpu gpu usage, very low video memory and memory usage, and usually very fast on the second run

poisenbery commented 5 months ago

My situation is exactly the same as yours, very low cpu gpu usage, very low video memory and memory usage, and usually very fast on the second run

Try the Workflow Component

I have converted my workflow to 80% workflow components, and it seems to massively mitigate the issue. image

I suspect that converting to 100% workflow components will eliminate the issue entirely, but it takes some time to set up.

ltdrdata commented 5 months ago

It seems that based on the current discussion, there might be a part where processing time increases exponentially as the workflow becomes larger.

whmc76 commented 5 months ago

It seems that based on the current discussion, there might be a part where processing time increases exponentially as the workflow becomes larger.

It looks like this, which causes every simple node to run exponentially slower, including seed, simple math, get size, load image, crop image, these tasks that were originally completed in an instant now take a few seconds to complete, only ksampler seems to be the only normal, and it will return to normal when the process is repeated for the second time, but when I replace and load another image, everything goes bad again. I've tried disabling or replacing most of the components with the same function, but nothing has solved the problem.

whmc76 commented 5 months ago

Workflow Component

Workflow Component This looks like a very cool tool, I will try it, but I still hope this tool is used to build my project better, not to solve problems. This hidden danger needs to be solved sooner or later

poisenbery commented 5 months ago

a part where processing time increases exponentially as the workflow becomes larger.

Yes, exactly this.

@whmc76 I think it would be good to edit the title "Node processing increases exponentially for large workflows" so that the issue is more clear (and gets the attention it needs).