Still super slow on M1 8GB machine

enzyme69 commented 1 year ago

I think I need 15-20 minutes running a simple prompt, at 20 samples.

I wonder if there is should be an option to run using CoreML Stable Diffusion?

Memory error also happened Loading model /Users/blendersushi/.cache/huggingface/transformers/4c1a32af58eeaff9f36410f7ca27e51a8856185c3f05d5b930975a1397914f10.98fc1312797017a8bac6993df565908fd18f09319b40d9bd35457dfa1459ecf0 onto mps:0 backend... 80%|█████████████████████████████████████████▌ | 12/15 [14:01<03:30, 70.10s/it] RuntimeError: Not enough memory, use lower resolution (max approx. 448x448). Need: 0.0GB free, Have:0.0GB free

enzyme69 commented 1 year ago

My feeling also is that the processing time is incorrectly stated. The GPU and CPU seems maxed, but speed is still slow. While if I use Diffusion Bee it's faster and somewhat I could still do other thing.

enzyme69 commented 1 year ago

Took me 20 minutes to generate the output: 000006_335075681_kdpmpp2m15_PS7 5_a_drawing_of_a_girl_sitting_on_a_pillow_with_a_flower_in_her_hand_and_a_piggy_bank_in_front_of_her_with_a_flower_in_her_hand_ generated

Quality is pretty good, but still too slow. If I use Stable Diffusion WebUI, 6-8 minutes. 000004_233126522_kdpmpp2m15_PS7 5_colorful_chickens_ generated 000005_471973997_kdpmpp2m15_PS7 5_pile_of_burger_ generated

Apple's CoreML loading time is slow (python vs swift mode), but I could get 20-40 seconds.

Generating 🖼  1/1: "a drawing of a girl sitting on a pillow with a flower in her hand and a piggy bank in front of her with a flower in her hand" 512x512px seed:335075681 prompt-strength:7.5 steps:15 sampler-type:k_dpmpp_2m
100%|████████████████████████████████████████████████████| 15/15 [19:35<00:00, 78.36s/it]
Downloading: 100%|███████████████████████████████████████| 342/342 [00:00<00:00, 166kB/s]
Downloading: 100%|██████████████████████████████████| 4.44k/4.44k [00:00<00:00, 1.01MB/s]
Downloading: 100%|██████████████████████████████████| 1.13G/1.13G [01:33<00:00, 13.0MB/s]
    Image Generated. Timings: conditioning:1.12s sampling:1175.51s safety-filter:102.11s total:1282.94s
    🖼  [generated] saved to: ./outputs/generated/000006_335075681_kdpmpp2m15_PS7.5_a_drawing_of_a_girl_sitting_on_a_pillow_with_a_flower_in_her_hand_and_a_piggy_bank_in_front_of_her_with_a_flower_in_her_hand_[generated].jpg

Cybergate9 commented 1 year ago

May not be much help, but here are some observations..

Using 7.0.0 on M1 16GB gives me python memory use readings as high as 20GB (that's 8GB out on swap!) so I think running on 8GB physical is always going to be ambitious for the SD-2.0 codebase.

Pretty sure DiffusionBee is on SD-1.5 codebase

If you install Imaginairy v5.1.0 (which is also the SD-1.5 codebase I think) you should get generate times similar to DiffusionBee from my experience

hope this helps

brycedrennan commented 1 year ago

I think diffusion bee uses a different architecture entirely. It may run a lot more efficiently.

@enzyme69 To make sure I'm understanding correctly, you're saying it takes 20 minutes with imaginairy, 7 minutes with automatic webui, and 30 seconds with CoreML. I didn't realize the CoreML model was out and working. I would like to integrate that but realistically not sure when I'll find the time.

@Cybergate9 The 2.0 model shouldn't be taking more memory... I think. I consider it a bug if performance is worse in 7.0 than 5.1. The 2.0v model however, does require more memoery and does run slower.

enzyme69 commented 1 year ago

Thanks for explanation.

(base) blendersushi@192-168-1-102 ~ % imagine --model SD-2.0 "a giant smiling face stone at a greenforest"
🤖🧠 imaginAIry received 1 prompt(s) and will repeat them 1 times to create 1 images.
Generating 🖼  1/1: "a giant smiling face stone at a greenforest" 512x512px seed:878376886 prompt-strength:7.5 steps:15 sampler-type:k_dpmpp_2m
Loading model /Users/blendersushi/.cache/huggingface/transformers/24bd254e54b30e83bcbc15efae29f0ef55256fd144823a9437f5956e594f6803.dcd6f0dd97c55495efb8393e64f704f3f398d695965876be0339bf96b93e2b4e onto mps:0 backend...
Downloading: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████| 3.94G/3.94G [05:06<00:00, 12.9MB/s]
 13%|██████████████████                                                                                                                     | 2/15 [01:12<07:58, 36.78s/i 20%|███████████████████████████                                                                                                            | 3/15 [01:46<07:05, 35.48s/i 27%|████████████████████████████████████                                                                                                   | 4/15 [02:15<06:02, 32.97s/i 33%|█████████████████████████████████████████████                                                                                          | 5/15 [02:46<05:21, 32.11s/i 40%|██████████████████████████████████████████████████████                                                                                 | 6/15 [03:18<04:50, 32.26s/i 47%|███████████████████████████████████████████████████████████████                                                                        | 7/15 [03:53<04:22, 32.87s/i 53%|████████████████████████████████████████████████████████████████████████                                                               | 8/15 [04:23<03:44, 32.03s/i 60%|█████████████████████████████████████████████████████████████████████████████████                                                      | 9/15 [04:53<03:08, 31.46s/i 67%|█████████████████████████████████████████████████████████████████████████████████████████▎                                            | 10/15 [05:24<02:36, 31.38s/i 73%|██████████████████████████████████████████████████████████████████████████████████████████████████▎                                   | 11/15 [05:56<02:05, 31.41s/i 80%|███████████████████████████████████████████████████████████████████████████████████████████████████████████▏                          | 12/15 [06:31<01:37, 32.47s/i 87%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▏                 | 13/15 [07:06<01:06, 33.24s/i 93%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████         | 14/15 [07:38<00:32, 32.90s/i100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 15/15 [08:12<00:00, 33.30s/i100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 15/15 [08:12<00:00, 32.83s/it]
    Image Generated. Timings: conditioning:3.98s sampling:492.81s safety-filter:6.42s total:506.08s
    🖼  [generated] saved to: ./outputs/generated/000007_878376886_kdpmpp2m15_PS7.5_a_giant_smiling_face_stone_at_a_greenforest_[generated].jpg

After a restart and closing all apps, I did get a faster result around 8 minutes. This might be as good as it gets for 8 GB CPU. I need 16 - 32 GB with M1Max or M2.

That's right

I tested WebUI, it's around 6-8 minutes.
CoreML StableDiffusion will give around 20-30 seconds.
DiffusionBee is using different algorithm/architecture indeed fastest.

Cybergate9 commented 1 year ago

@Cybergate9 The 2.0 model shouldn't be taking more memory... I think. I consider it a bug if performance is worse in 7.0 than 5.1. The 2.0v model however, does require more memoery and does run slower.

nope, difference has always been there for me, between 6.x-7.x and 5.x, i.e. same command line, same model:

imagine "a picture of an 18th century lady in style of decoupage" --model SD-1.5

V5.1

Generating 🖼  1/1: "a picture of an 18th century lady in style of decoupage" 512x512px seed:907966062 prompt-strength:7.5 steps:15 sampler-type:k_dpmpp_2m
Loading model /Users/shaun/.cache/huggingface/transformers/4c1a32af58eeaff9f36410f7ca27e51a8856185c3f05d5b930975a1397914f10.98fc1312797017a8bac6993df565908fd18f09319b40d9bd35457dfa1459ecf0 onto mps:0 backend...
100%|███████████████████████████████████████████████| 15/15 [00:17<00:00,  1.19s/it]
    Image Generated. Timings: conditioning:0.22s sampling:17.90s safety-filter:5.52s total:24.58s

V7.0.0

Generating 🖼  1/1: "a picture of an 18th century lady in style of decoupage" 512x512px seed:779724400 prompt-strength:7.5 steps:15 sampler-type:k_dpmpp_2m
Loading model /Users/shaun/.cache/huggingface/transformers/4c1a32af58eeaff9f36410f7ca27e51a8856185c3f05d5b930975a1397914f10.98fc1312797017a8bac6993df565908fd18f09319b40d9bd35457dfa1459ecf0 onto mps:0 backend...
100%|███████████████████████████████████████████| 15/15 [01:50<00:00,  7.36s/it]
    Image Generated. Timings: conditioning:0.30s sampling:110.45s safety-filter:6.60s total:119.26s

no idea why, always assumed new codebase released for SD-2 and incorporated since 6.0.0a was the difference?

Cybergate9 commented 1 year ago

@enzyme69 To make sure I'm understanding correctly, you're saying it takes 20 minutes with imaginairy, 7 minutes with automatic webui, and 30 seconds with CoreML. I didn't realize the CoreML model was out and working. I would like to integrate that but realistically not sure when I'll find the time.

CoreML initial at: https://github.com/apple/ml-stable-diffusion My two cents worth:

I installed/configured etc as per their instructions, but due 'model compilation' times (if using their python code) each run then it's definitely not faster than anything else.
Even converting models using their tools and using 'swift run StableDiffusionSample' doesn't get significantly different times for me.. and some really weird results using their equivalent of repeat 10 for some reason..

brycedrennan / imaginAIry

Still super slow on M1 8GB machine #129

V5.1

V7.0.0