comfyanonymous / ComfyUI

The most powerful and modular diffusion model GUI, api and backend with a graph/nodes interface.
https://www.comfy.org/
GNU General Public License v3.0
56.23k stars 5.96k forks source link

macbook m1, sdxl0.9 model, comfyui generation speed is much slower than webui, why? #948

Open guoreex opened 1 year ago

guoreex commented 1 year ago

I am a novice, my question may be a bit simple for everyone, I hope someone is willing to answer, thank you. Recently I am using sdxl0.9 in comfyui and auto1111, their generation speeds are too different, compter: macbook pro macbook m1,16G RAM. comfyui: 70s/it image

auto1111 webui dev: 5s/it image

Comfyui's unique workflow is very attractive, but the speed on mac m1 is frustrating. Is there anyone in the same situation as me?

comfyanonymous commented 1 year ago

Yes I don't have a mac to test on so the speed is not optimized. You can try launching it with: --force-fp16 If it works it will increase speed.

guoreex commented 1 year ago

Thanks for your reply. --force-fp16, It seems that she can't work. I want to provide more test information, but I'm not a programmer and don't know what to do to help comfy operate better on mac. image

spikeyslam commented 1 year ago

--force-fp16 works when using the nightly version of PyTorch. It made the workflow much faster now. Using the default SDXL workflow on a MBP 16 M1 Pro 16GB:

100%|███████████████████████████████████████████| 20/20 [01:49<00:00,  5.46s/it]
100%|█████████████████████████████████████████████| 5/5 [00:36<00:00,  7.21s/it]
Prompt executed in 208.15 seconds
zenyr commented 1 year ago

I'm getting 6~7s/it on M1 Max 64G, SDXL 1.0, running both base & refiner with --force-fp16

100%|██████████| 15/15 [01:35<00:00,  6.36s/it]
Prompt executed in 193.71 seconds # dpp + karras

(this image may have the workflow engraved)

alessandroperilli commented 1 year ago

When I started using ComfyUI with Pytorch nightly for macOS, at the beginning of August, the generation speed on my M2 Max with 96GB RAM was on par with A1111/SD.Next. Progressively, it seemed to get a bit slower, but negligible.

However, at some point in the last two days, I noticed a drastic decrease in performance, as ComfyUI generated images in twice the time it normally would with same sampler, steps, cfg, etc.

In the attempt to fix things, I updated to today's Pytorch nightly* and the generation speed returned approximately the same I remembered.

Right now, I generate an image with the SDXL Base + Refiner models with the following settings:

MacOS: 13.5.1 (22G90) Base checkpoint: sd_xl_base_1.0_0.9vae Refiner checkpoint: sd_xl_refiner_1.0_0.9vae Image size: 1344x768px Sampler: DPM++ 2s Ancestral Scheduler: Karras Steps: 70 CFG Scale: 10 Aesthetic Score: 6

at the following speed:

(Base) 100%|##########| 52/52 [03:31<00:00, 4.06s/it] (Refiner) 100%|##########| 18/18 [01:44<00:00, 5.83s/it]

If anyone could do a run with the same settings and let me know their results, I'd be grateful.

*To update your Pytorch nightly: exit ComfyUI, be sure you are still in the virtual environment, and then run: pip install --upgrade --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cpu

alessandroperilli commented 1 year ago

Since I wrote the previous reply, I've experience even more erratic behavior: now, when I activate things like a LoRA or ControlNet, the generation time goes up dramatically and, it seems, in proportion to how many things I activate. If I turn on 1 LoRA and 3 ControlNets, I get something like 12min to generate 1 image.

It never happened before and, clearly, there's something wrong. It might be something I've done with my workflow, but it's odd.

@comfyanonymous, does ComfyUI have a debug flag where I can see everything happening in the terminal, like SD.next?

neilmendoza commented 1 year ago

+1

I'm getting between 1.5it/s and 3it/s running SDXL 1.0 base without refiner with --force-fp16 on M2 Max with 96GB RAM.

rogueturnip commented 1 year ago

I'm curious what version of Mac OS you are running? I started using Comfy today because automatic1111 was crashing and it appears related to the Mac OS 14 Sonoma upgrade so I'm curious if this processing speed issue could also be related. Comfy isn't anywhere near as fast as what Automatic was before the crashing started.

neilmendoza commented 1 year ago

I'm curious what version of Mac OS you are running? I started using Comfy today because automatic1111 was crashing and it appears related to the Mac OS 14 Sonoma upgrade so I'm curious if this processing speed issue could also be related. Comfy isn't anywhere near as fast as what Automatic was before the crashing started.

I'm running 13.6, Ventura.

alessandroperilli commented 1 year ago

I'm curious what version of Mac OS you are running? I started using Comfy today because automatic1111 was crashing and it appears related to the Mac OS 14 Sonoma upgrade so I'm curious if this processing speed issue could also be related. Comfy isn't anywhere near as fast as what Automatic was before the crashing started.

14.0 (23A344), Sonoma.

guoreex commented 1 year ago

When I started using ComfyUI with Pytorch nightly for macOS, at the beginning of August, the generation speed on my M2 Max with 96GB RAM was on par with A1111/SD.Next. Progressively, it seemed to get a bit slower, but negligible.

However, at some point in the last two days, I noticed a drastic decrease in performance, as ComfyUI generated images in twice the time it normally would with same sampler, steps, cfg, etc.

In the attempt to fix things, I updated to today's Pytorch nightly* and the generation speed returned approximately the same I remembered.

Right now, I generate an image with the SDXL Base + Refiner models with the following settings:

MacOS: 13.5.1 (22G90) Base checkpoint: sd_xl_base_1.0_0.9vae Refiner checkpoint: sd_xl_refiner_1.0_0.9vae Image size: 1344x768px Sampler: DPM++ 2s Ancestral Scheduler: Karras Steps: 70 CFG Scale: 10 Aesthetic Score: 6

at the following speed:

(Base) 100%|##########| 52/52 [03:31<00:00, 4.06s/it] (Refiner) 100%|##########| 18/18 [01:44<00:00, 5.83s/it]

If anyone could do a run with the same settings and let me know their results, I'd be grateful.

*To update your Pytorch nightly: exit ComfyUI, be sure you are still in the virtual environment, and then run: pip install --upgrade --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cpu

thx a lot, I checked some solutions on the network. Today, I solved the problem of full comfy speed by upgrading the pytorch version, and upgraded the pytorch command of the local environment:

pip install --upgrade --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cpu

After the upgrade: pytorch 2.2.0.dev20231009 python 3.11.4

my sys: macos Sonoma 14.0,MacBook pro m1, 16g, sd_xl_base_1.0_0.9vae & sd_xl_refiner_1.0_0.9vae

Image size:1024*1024

launching it with: --force-fp16 Generation speed: 100%|██████████| 20/20 [01:42<00:00, 5.12s/it] 100%|██████████| 20/20 [02:03<00:00, 6.15s/it]

alessandroperilli commented 1 year ago

@guoreex can you run a generation with my same exact parameters and report your speed?

Image size: 1344x768px Sampler: DPM++ 2s Ancestral Scheduler: Karras Steps: 70 CFG Scale: 10 (Pos & Neg) Aesthetic Score: 6

(Your Base steps should be 75-80% of the total steps, leaving the remaining steps to the Refiner. So, in this example: 52 steps for the Base and 18 steps for the Refiner)

Omhet commented 11 months ago

M2 Pro, 16Gb, Sonoma 14.1.1 I didn't upgrade to the torch nightly, but confirm that --force-fp16 works. It takes ~180 sec to generate with 20 steps of SDXL base and 5 steps of refiner.

If I try with --use-split-cross-attention --force-fp16 it gets slower to ~200 sec

pechaut78 commented 11 months ago

M3 Max , 64gb, only 11gb taken.. GPU bound, using the SDXL turbo fp16, force fp16 I get 2.45 iter /s on the sample provided for turbo

In this video the guy gets 11 on a 3090 https://www.youtube.com/watch?v=kApJkjjIhbs

TabassomArgi commented 10 months ago

I just tried ComfyUI on my Mac and I was surprised by its slow results! Mac Os Sonoma 14.1.1, Mac mini M2 Pro, 16g ram, Run with "force-fp16", sd_xl_base_1.0, 1344x768px, DPM++ 2s Ancestral, Karras, steps 20, CFG 8 ~ 190 sec

GeneralShan commented 2 months ago

i am havng this problem TOO...