Open guoreex opened 1 year ago
Yes I don't have a mac to test on so the speed is not optimized. You can try launching it with: --force-fp16 If it works it will increase speed.
Thanks for your reply. --force-fp16, It seems that she can't work. I want to provide more test information, but I'm not a programmer and don't know what to do to help comfy operate better on mac.
--force-fp16
works when using the nightly version of PyTorch. It made the workflow much faster now. Using the default SDXL workflow on a MBP 16 M1 Pro 16GB:
100%|███████████████████████████████████████████| 20/20 [01:49<00:00, 5.46s/it]
100%|█████████████████████████████████████████████| 5/5 [00:36<00:00, 7.21s/it]
Prompt executed in 208.15 seconds
I'm getting 6~7s/it on M1 Max 64G, SDXL 1.0, running both base & refiner with --force-fp16
100%|██████████| 15/15 [01:35<00:00, 6.36s/it]
Prompt executed in 193.71 seconds # dpp + karras
(this image may have the workflow engraved)
When I started using ComfyUI with Pytorch nightly for macOS, at the beginning of August, the generation speed on my M2 Max with 96GB RAM was on par with A1111/SD.Next. Progressively, it seemed to get a bit slower, but negligible.
However, at some point in the last two days, I noticed a drastic decrease in performance, as ComfyUI generated images in twice the time it normally would with same sampler, steps, cfg, etc.
In the attempt to fix things, I updated to today's Pytorch nightly* and the generation speed returned approximately the same I remembered.
Right now, I generate an image with the SDXL Base + Refiner models with the following settings:
MacOS: 13.5.1 (22G90) Base checkpoint: sd_xl_base_1.0_0.9vae Refiner checkpoint: sd_xl_refiner_1.0_0.9vae Image size: 1344x768px Sampler: DPM++ 2s Ancestral Scheduler: Karras Steps: 70 CFG Scale: 10 Aesthetic Score: 6
at the following speed:
(Base) 100%|##########| 52/52 [03:31<00:00, 4.06s/it] (Refiner) 100%|##########| 18/18 [01:44<00:00, 5.83s/it]
If anyone could do a run with the same settings and let me know their results, I'd be grateful.
*To update your Pytorch nightly: exit ComfyUI, be sure you are still in the virtual environment, and then run:
pip install --upgrade --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cpu
Since I wrote the previous reply, I've experience even more erratic behavior: now, when I activate things like a LoRA or ControlNet, the generation time goes up dramatically and, it seems, in proportion to how many things I activate. If I turn on 1 LoRA and 3 ControlNets, I get something like 12min to generate 1 image.
It never happened before and, clearly, there's something wrong. It might be something I've done with my workflow, but it's odd.
@comfyanonymous, does ComfyUI have a debug flag where I can see everything happening in the terminal, like SD.next?
+1
I'm getting between 1.5it/s and 3it/s running SDXL 1.0 base without refiner with --force-fp16 on M2 Max with 96GB RAM.
I'm curious what version of Mac OS you are running? I started using Comfy today because automatic1111 was crashing and it appears related to the Mac OS 14 Sonoma upgrade so I'm curious if this processing speed issue could also be related. Comfy isn't anywhere near as fast as what Automatic was before the crashing started.
I'm curious what version of Mac OS you are running? I started using Comfy today because automatic1111 was crashing and it appears related to the Mac OS 14 Sonoma upgrade so I'm curious if this processing speed issue could also be related. Comfy isn't anywhere near as fast as what Automatic was before the crashing started.
I'm running 13.6, Ventura.
I'm curious what version of Mac OS you are running? I started using Comfy today because automatic1111 was crashing and it appears related to the Mac OS 14 Sonoma upgrade so I'm curious if this processing speed issue could also be related. Comfy isn't anywhere near as fast as what Automatic was before the crashing started.
14.0 (23A344), Sonoma.
When I started using ComfyUI with Pytorch nightly for macOS, at the beginning of August, the generation speed on my M2 Max with 96GB RAM was on par with A1111/SD.Next. Progressively, it seemed to get a bit slower, but negligible.
However, at some point in the last two days, I noticed a drastic decrease in performance, as ComfyUI generated images in twice the time it normally would with same sampler, steps, cfg, etc.
In the attempt to fix things, I updated to today's Pytorch nightly* and the generation speed returned approximately the same I remembered.
Right now, I generate an image with the SDXL Base + Refiner models with the following settings:
MacOS: 13.5.1 (22G90) Base checkpoint: sd_xl_base_1.0_0.9vae Refiner checkpoint: sd_xl_refiner_1.0_0.9vae Image size: 1344x768px Sampler: DPM++ 2s Ancestral Scheduler: Karras Steps: 70 CFG Scale: 10 Aesthetic Score: 6
at the following speed:
(Base) 100%|##########| 52/52 [03:31<00:00, 4.06s/it] (Refiner) 100%|##########| 18/18 [01:44<00:00, 5.83s/it]
If anyone could do a run with the same settings and let me know their results, I'd be grateful.
*To update your Pytorch nightly: exit ComfyUI, be sure you are still in the virtual environment, and then run:
pip install --upgrade --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cpu
thx a lot, I checked some solutions on the network. Today, I solved the problem of full comfy speed by upgrading the pytorch version, and upgraded the pytorch command of the local environment:
pip install --upgrade --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cpu
After the upgrade: pytorch 2.2.0.dev20231009 python 3.11.4
my sys: macos Sonoma 14.0,MacBook pro m1, 16g, sd_xl_base_1.0_0.9vae & sd_xl_refiner_1.0_0.9vae
Image size:1024*1024
launching it with: --force-fp16 Generation speed: 100%|██████████| 20/20 [01:42<00:00, 5.12s/it] 100%|██████████| 20/20 [02:03<00:00, 6.15s/it]
@guoreex can you run a generation with my same exact parameters and report your speed?
Image size: 1344x768px Sampler: DPM++ 2s Ancestral Scheduler: Karras Steps: 70 CFG Scale: 10 (Pos & Neg) Aesthetic Score: 6
(Your Base steps should be 75-80% of the total steps, leaving the remaining steps to the Refiner. So, in this example: 52 steps for the Base and 18 steps for the Refiner)
M2 Pro, 16Gb, Sonoma 14.1.1
I didn't upgrade to the torch nightly, but confirm that --force-fp16
works. It takes ~180 sec to generate with 20 steps of SDXL base and 5 steps of refiner.
If I try with --use-split-cross-attention --force-fp16
it gets slower to ~200 sec
M3 Max , 64gb, only 11gb taken.. GPU bound, using the SDXL turbo fp16, force fp16 I get 2.45 iter /s on the sample provided for turbo
In this video the guy gets 11 on a 3090 https://www.youtube.com/watch?v=kApJkjjIhbs
I just tried ComfyUI on my Mac and I was surprised by its slow results! Mac Os Sonoma 14.1.1, Mac mini M2 Pro, 16g ram, Run with "force-fp16", sd_xl_base_1.0, 1344x768px, DPM++ 2s Ancestral, Karras, steps 20, CFG 8 ~ 190 sec
i am havng this problem TOO...
I am a novice, my question may be a bit simple for everyone, I hope someone is willing to answer, thank you. Recently I am using sdxl0.9 in comfyui and auto1111, their generation speeds are too different, compter: macbook pro macbook m1,16G RAM. comfyui: 70s/it
auto1111 webui dev: 5s/it
Comfyui's unique workflow is very attractive, but the speed on mac m1 is frustrating. Is there anyone in the same situation as me?