CompVis / stable-diffusion

A latent text-to-image diffusion model
https://ommer-lab.com/research/latent-diffusion-models/
Other
68.55k stars 10.18k forks source link

Instructions for setup and running on Mac Silicon chips #25

Open crsrusl opened 2 years ago

crsrusl commented 2 years ago

Hi,

I’ve heard it is possible to run Stable-Diffusion on Mac Silicon (albeit slowly), would be good to include basic setup and instructions to do this.

Thanks, Chris

henrique-galimberti commented 2 years ago

n_iter = sequential generations (will increase generation time but will not increase memory usage). n_samples = parallel generations, batch execution, (will increase generation time and memory usage).

junukwon7 commented 2 years ago

Using the procedure above, it works fantastically well in my macbook. Every batch spent 28sec. (M1 Max, 64G)

However, if I expand the size to 1024 by 1024, it returns MPSNDArray error.

my result is: AppleInternal/Library/BuildRoots/20d6c351-ee94-11ec-bcaf-7247572f23b4/Library/Caches/com.apple.xbs/Sources/MetalPerformanceShaders/MPSCore/Types/MPSNDArray.mm:705: failed assertion '[MPSNDArray initWithDevice:descriptor:] Error: product of dimension sizes > 2**31' | 0/50 [00:00<?, ?it/s] which is near identical to @filipux's result - the message is slightly different.

Have anyone managed to run img2img? I just did quick test with a 512x512 input image but there is an error message:

AppleInternal/Library/BuildRoots/20d6c351-ee94-11ec-bcaf-7247572f23b4/Library/Caches/com.apple.xbs/Sources/MetalPerformanceShaders/MPSCore/Types/MPSNDArray.mm:705: failed assertion '[MPSTemporaryNDArray initWithDevice:descriptor:] Error: product of dimension sizes > 2**31'

This error also shows when I try to change the width or height (--W/--H) in regular txt2img.

To make things clear, this problem occurs then I try to make the resolution higher than some limit. When using txt2img, the error occurs somewhere between 768 by 768 and 1024 by 1024, since 768 worked and 1024 failed. (which is pretty obvious since the error states that the product exceeded INT_MAX, so bigger resolution leads to error.)

So, the question is,

  1. Can you guys reproduce this error?
  2. Would lowering the image be the only solution? I still can't understand why these problem occur in gpu calculations - since they were meant to handle large-scale parallel operations?
  3. Some recommendations for up-scaling pictures (ex, 512p to 1024p) in apple silicon environments?
jackwh commented 2 years ago

@junukwon7

I still can't understand why these problem occur in gpu calculations - since they were meant to handle large-scale parallel operations?

This is bleeding-edge stuff, the underlying Python libraries aren't yet fully optimised for Apple Silicon and still contain some bugs and incompatibilities. These will get ironed out very soon, I'm sure.

Some recommendations for up-scaling pictures (ex, 512p to 1024p) in apple silicon environments?

  1. Nero Upscaler is free, web-based, and works quite well in my experience
  2. Gigapixel is an Apple Silicon-optimised native app. $99 but seems to be very well reviewed from what I've seen.
HenkPoley commented 2 years ago

@cgodley Re: https://github.com/CompVis/stable-diffusion/issues/25#issuecomment-1224040740

I had to use conda update --force-reinstall -y -n base -c defaults conda (with --force-reinstall).

To get past:

RemoveError: 'cytoolz' is a dependency of conda and cannot be removed from
conda's operating environment.
RemoveError: 'setuptools' is a dependency of conda and cannot be removed from
conda's operating environment.

Good to know for people who are new to conda, how to remove an environment when conda env create fails: conda env remove -n ldm. Optionally, with conda env create you can use the -n option to create an environment with another name.

You also forgot to mention where to load the conda environment.

E.g. where conda activate ldm and conda deactivate. I suppose directly after creation.

You can load pytorch-nightly directly inside environment-mac.yaml under - channels: I believe.

You could pin these package versions I guess:

The path to functional.py is ~/miniconda3/envs/ldm/lib/python3.10/site-packages/torch/nn/functional.py

E.g.:

patch environment-mac.yaml < environment-mac.yaml.diff
patch ~/miniconda3/envs/ldm/lib/python3.10/site-packages/torch/nn/functional.py < funtional.py.diff

You need an mkdir -p models/ldm/stable-diffusion-v1/ before the (sym)linking of the model.ckpt. Btw, I had issues with #47, when using a symlink, so I had to hardlink (drop the -s). Same with your description

Anyways, thanks, it appears this works. Or at least the 'PLMS sampler' progresses instead of errors.

Edit: just over 8 minutes. Now, why does it have twice the same/similar image in the output?

If you want to use a random seed: --seed "$(shuf -i 1-4294967295 -n1)"

My environment-mac.yaml.diff:

diff --git a/environment-mac.yaml b/environment-mac.yaml
index d923d56..a664a6a 100644
--- a/environment-mac.yaml
+++ b/environment-mac.yaml
@@ -3,14 +3,14 @@ channels:
   - pytorch
   - defaults
 dependencies:
-  - python=3.8.5
-  - pip=20.3
+  - python=3.10.4
+  - pip=22.1.2
   - pytorch=1.12.1
   - torchvision=0.13.1
-  - numpy=1.19.2
+  - numpy=1.23.1
   - pip:
-    - albumentations==0.4.3
-    - opencv-python==4.1.2.30
+    - albumentations==0.4.6
+    - opencv-python==4.6.0.66
     - pudb==2019.2
     - imageio==2.9.0
     - imageio-ffmpeg==0.4.2
@@ -20,7 +20,7 @@ dependencies:
     - streamlit>=0.73.1
     - einops==0.3.0
     - torch-fidelity==0.3.0
-    - transformers==4.19.2
+    - transformers==4.21.2
     - torchmetrics==0.6.0
     - kornia==0.6
     - -e git+https://github.com/CompVis/taming-transformers.git@master#egg=taming-transformers
HenkPoley commented 2 years ago

@cgodley I got it working, 8 minutes runtime, regular M1 8 core GPU, plms scheduler. But with lots of swapping, I think.

(Also with n_samples=1, 6 minutes and few seconds, when not swapping so much. I saw it use 13.42 GB RAM briefly. It creates two samples in the outputs/txt2img-samples/samples/ directory. With the discharge rate of 21-22.6 I should be able to get about 22x2 images per charge 🙊)

Maybe you could integrate the things I mention in your comment?

magnusviri commented 2 years ago

I've updated my repo with instructions and the latest commits, which mainly adds diffusers, the invisible watermark, and the nsfw filter (which is trivial to disable, if you don't know how do a websearch). I've also added a patch file and updated the environments-mac.yml file with the pytorch-nightly channel and updated versions (as shown above).

I also have a question. Can anyone get seeds to work? When I set the seed I always get a new image. I'm wondering if it's because of Apple Silicon. That's why I'm asking here. If you have this working, can you test this by adding --seed 12345678 and generate something twice to see if it's the same?

HenkPoley commented 2 years ago

The result appears to be different each time.

You could try torch.use_deterministic_algorithms(True), but maybe Apple Silicon isn't supported (yet). https://pytorch.org/docs/stable/notes/randomness.html

Edit: it doesn't fix it

magnusviri commented 2 years ago

"Completely reproducible results are not guaranteed across PyTorch releases, individual commits, or different platforms. Furthermore, results may not be reproducible between CPU and GPU executions, even when using identical seeds."

I bet it's MPS. There is an open issue in pytorch regarding gradient inconsistency. I am guessing that's what is causing this.

Birch-san commented 2 years ago

brilliant! I confirm that I was able to use @magnusviri's apple-silicon-mps-support branch successfully with pytorch nightly 1.13.0.dev20220823.

like everybody else has said: we have to modify torch.nn.functional.py::layer_norm to move the input tensor into contiguous memory, as described above (https://github.com/CompVis/stable-diffusion/issues/25#issuecomment-1221416526) otherwise https://github.com/pytorch/pytorch/issues/80800 happens.

I took measurements (PLMS sampler doing 50 timesteps):
a batch-of-1 image can be generated in 39.5 secs.
a batch-of-3 images can be generated in 110 secs.

txt2img works.
img2img works.

gusmcnair commented 2 years ago

Anyone had any luck solving this issue?

AppleInternal/Library/BuildRoots/20d6c351-ee94-11ec-bcaf-7247572f23b4/Library/Caches/com.apple.xbs/Sources/MetalPerformanceShaders/MPSCore/Types/MPSNDArray.mm:705: failed assertion '[MPSTemporaryNDArray initWithDevice:descriptor:] Error: product of dimension sizes > 2**31'

johndpope commented 2 years ago

awesome to see mac silicon able to do this - cuda cock blocking by apple has been such a bane to ai. Would love to see m2 outperform my 3090. the mac pro is going to have > 128 gb of combined ram - so you'd think it would trump this 24gb nvidia card. I know nvidia are paddling hard to come up with an m1 equivovalent. this is what is spat out by my card - running python scripts/txt2img.py --prompt "central park album cover"

 sampling is (3, 4, 64, 64), eta 0.0                                            | 0/1 [00:00<?, ?it/s]
Running DDIM Sampling with 50 timesteps
DDIM Sampler: 100%|█████████████████████████████████████████████████████████████████████| 50/50 [00:12<00:00,  4.13it/s]
data: 100%|███████████████████████████████████████████████████████████████████████████████| 1/1 [00:16<00:00, 16.14s/it]
Sampling:  50%|█████████████████████████████████████▌                                     | 1/2 [00:16<00:16, 16.14s/itData shape for DDIM sampling is (3, 4, 64, 64), eta 0.0                                            | 0/1 [00:00<?, ?it/s]
Running DDIM Sampling with 50 timesteps
DDIM Sampler: 100%|█████████████████████████████████████████████████████████████████████| 50/50 [00:11<00:00,  4.17it/s]
data: 100%|███████████████████████████████████████████████████████████████████████████████| 1/1 [00:15<00:00, 15.10s/it]
Sampling: 100%|███████████████████████████████████████████████████████████████████████████| 2/2 [00:31<00:00, 15.62s/it]
akaboshinit commented 2 years ago

Can someone tell me the solution...

env:

% python -V                                                                              
Python 3.10.4

% conda -V          
conda 4.14.0

error:

% python scripts/txt2img.py --prompt "a photograph of an astronaut riding a horse" --plms
Global seed set to 42
Loading model from models/ldm/stable-diffusion-v1/model.ckpt
Traceback (most recent call last):
  File "/Users/redstar16/project/akaboshinit/clone/stable-diffusion/scripts/txt2img.py", line 355, in <module>
    main()
  File "/Users/redstar16/project/akaboshinit/clone/stable-diffusion/scripts/txt2img.py", line 249, in main
    model = load_model_from_config(config, f"{opt.ckpt}")
  File "/Users/redstar16/project/akaboshinit/clone/stable-diffusion/scripts/txt2img.py", line 59, in load_model_from_config
    pl_sd = torch.load(ckpt, map_location="cpu")
  File "/Users/redstar16/.asdf/installs/python/miniforge3-4.10.1-5/envs/ldm/lib/python3.10/site-packages/torch/serialization.py", line 735, in load
    return _legacy_load(opened_file, map_location, pickle_module, **pickle_load_args)
  File "/Users/redstar16/.asdf/installs/python/miniforge3-4.10.1-5/envs/ldm/lib/python3.10/site-packages/torch/serialization.py", line 942, in _legacy_load
    magic_number = pickle_module.load(f, **pickle_load_args)
_pickle.UnpicklingError: invalid load key, 'v'.
cgodley commented 2 years ago

Loading model from models/ldm/stable-diffusion-v1/model.ckpt ... _pickle.UnpicklingError: invalid load key, 'v'.

@akaboshinit Did you remember to do git lfs checkout in the stable-diffusion-v-1-4-original project? If you did not do git lfs checkout or git lfs pull then the file contents of model.ckpt may be a git lfs metadata placeholder instead of the real data.

akaboshinit commented 2 years ago

@cgodley I didn't check enough... thanks! I'm ready to take the next step!

But I'm getting the error again. Please help.

% python scripts/txt2img.py --prompt "a photograph of an astronaut riding a horse" --plms
Global seed set to 42
Loading model from models/ldm/stable-diffusion-v1/model.ckpt
Global Step: 470000
LatentDiffusion: Running in eps-prediction mode
DiffusionWrapper has 859.52 M params.
making attention of type 'vanilla' with 512 in_channels
Working with z of shape (1, 4, 32, 32) = 4096 dimensions.
making attention of type 'vanilla' with 512 in_channels
Some weights of the model checkpoint at openai/clip-vit-large-patch14 were not used when initializing CLIPTextModel: ['vision_model.encoder.layers.7.self_attn.out_proj.bias', 'vision_model.encoder.layers.7.self_attn.q_proj.weight', 'vision_model.encoder.layers.9.self_attn.v_proj.weight', 'vision_model.encoder.layers.15.self_attn.out_proj.bias', 'vision_model.encoder.layers.23.layer_norm1.weight', 'vision_model.encoder.layers.1.mlp.fc1.bias', 'vision_model.encoder.layers.12.self_attn.v_proj.bias', 'vision_model.encoder.layers.7.layer_norm2.weight', 'vision_model.encoder.layers.2.mlp.fc2.weight', 'vision_model.encoder.layers.20.self_attn.v_proj.weight', 'vision_model.encoder.layers.4.self_attn.out_proj.bias', 'vision_model.encoder.layers.9.mlp.fc2.weight', 'vision_model.encoder.layers.17.self_attn.q_proj.weight', 'vision_model.encoder.layers.12.self_attn.out_proj.bias', 'vision_model.encoder.layers.2.mlp.fc1.weight', 'vision_model.encoder.layers.10.mlp.fc1.weight', 'vision_model.encoder.layers.4.mlp.fc1.bias', 'vision_model.encoder.layers.23.mlp.fc1.weight', 'vision_model.encoder.layers.12.layer_norm1.weight', 'vision_model.encoder.layers.1.self_attn.v_proj.weight', 'vision_model.encoder.layers.17.mlp.fc2.weight', 'vision_model.encoder.layers.23.mlp.fc2.weight', 'vision_model.encoder.layers.14.self_attn.out_proj.weight', 'vision_model.encoder.layers.10.layer_norm1.weight', 'vision_model.encoder.layers.4.self_attn.k_proj.weight', 'vision_model.encoder.layers.11.self_attn.out_proj.weight', 'vision_model.encoder.layers.3.self_attn.q_proj.weight', 'vision_model.encoder.layers.9.mlp.fc1.bias', 'vision_model.encoder.layers.12.mlp.fc1.weight', 'vision_model.encoder.layers.21.self_attn.q_proj.weight', 'vision_model.encoder.layers.10.mlp.fc2.bias', 'vision_model.encoder.layers.10.self_attn.v_proj.weight', 'vision_model.encoder.layers.15.self_attn.v_proj.bias', 'vision_model.encoder.layers.8.self_attn.v_proj.weight', 'vision_model.encoder.layers.12.mlp.fc1.bias', 'vision_model.encoder.layers.2.mlp.fc2.bias', 'vision_model.encoder.layers.14.mlp.fc2.bias', 'vision_model.encoder.layers.1.self_attn.q_proj.bias', 'vision_model.encoder.layers.18.self_attn.k_proj.weight', 'vision_model.encoder.layers.20.self_attn.k_proj.bias', 'vision_model.encoder.layers.10.self_attn.out_proj.weight', 'vision_model.encoder.layers.3.layer_norm2.weight', 'vision_model.encoder.layers.4.layer_norm2.bias', 'vision_model.encoder.layers.22.self_attn.out_proj.bias', 'vision_model.encoder.layers.23.self_attn.k_proj.bias', 'vision_model.encoder.layers.18.self_attn.q_proj.weight', 'vision_model.encoder.layers.5.layer_norm1.bias', 'vision_model.encoder.layers.3.mlp.fc2.weight', 'vision_model.encoder.layers.18.self_attn.v_proj.bias', 'vision_model.encoder.layers.7.self_attn.k_proj.weight', 'vision_model.encoder.layers.8.layer_norm2.bias', 'vision_model.encoder.layers.21.mlp.fc1.weight', 'vision_model.encoder.layers.18.mlp.fc2.bias', 'vision_model.encoder.layers.5.self_attn.k_proj.weight', 'vision_model.encoder.layers.17.self_attn.q_proj.bias', 'vision_model.encoder.layers.6.layer_norm2.bias', 'vision_model.encoder.layers.22.self_attn.q_proj.weight', 'vision_model.encoder.layers.12.self_attn.v_proj.weight', 'vision_model.encoder.layers.21.layer_norm2.bias', 'vision_model.encoder.layers.22.layer_norm1.weight', 'vision_model.encoder.layers.22.layer_norm1.bias', 'vision_model.encoder.layers.12.mlp.fc2.bias', 'vision_model.encoder.layers.16.self_attn.k_proj.weight', 'vision_model.encoder.layers.7.self_attn.v_proj.weight', 'vision_model.encoder.layers.13.self_attn.v_proj.weight', 'vision_model.encoder.layers.22.layer_norm2.weight', 'vision_model.encoder.layers.8.mlp.fc2.bias', 'vision_model.encoder.layers.18.self_attn.q_proj.bias', 'vision_model.encoder.layers.19.self_attn.k_proj.bias', 'vision_model.encoder.layers.17.layer_norm1.weight', 'vision_model.encoder.layers.18.layer_norm1.weight', 'vision_model.encoder.layers.15.layer_norm2.weight', 'vision_model.encoder.layers.14.layer_norm2.weight', 'vision_model.encoder.layers.11.mlp.fc1.bias', 'vision_model.encoder.layers.0.self_attn.k_proj.bias', 'vision_model.encoder.layers.11.self_attn.q_proj.bias', 'vision_model.encoder.layers.5.self_attn.v_proj.bias', 'vision_model.encoder.layers.23.self_attn.out_proj.bias', 'vision_model.encoder.layers.2.self_attn.v_proj.bias', 'vision_model.encoder.layers.1.self_attn.k_proj.bias', 'vision_model.encoder.layers.4.self_attn.q_proj.weight', 'vision_model.encoder.layers.17.self_attn.out_proj.weight', 'vision_model.encoder.layers.19.layer_norm1.bias', 'vision_model.encoder.layers.15.mlp.fc2.bias', 'vision_model.encoder.layers.19.mlp.fc2.weight', 'vision_model.encoder.layers.23.self_attn.k_proj.weight', 'vision_model.encoder.layers.5.self_attn.out_proj.bias', 'vision_model.encoder.layers.6.self_attn.q_proj.weight', 'vision_model.encoder.layers.14.mlp.fc1.weight', 'vision_model.encoder.layers.0.self_attn.v_proj.bias', 'vision_model.encoder.layers.12.layer_norm2.bias', 'vision_model.encoder.layers.4.layer_norm2.weight', 'vision_model.encoder.layers.19.self_attn.out_proj.weight', 'vision_model.encoder.layers.17.layer_norm2.bias', 'vision_model.encoder.layers.13.self_attn.out_proj.bias', 'vision_model.encoder.layers.3.self_attn.k_proj.bias', 'vision_model.encoder.layers.19.self_attn.v_proj.weight', 'vision_model.encoder.layers.6.layer_norm1.bias', 'vision_model.encoder.layers.3.self_attn.v_proj.bias', 'vision_model.encoder.layers.20.layer_norm1.weight', 'vision_model.encoder.layers.18.layer_norm1.bias', 'vision_model.encoder.layers.4.layer_norm1.bias', 'vision_model.encoder.layers.2.self_attn.out_proj.weight', 'vision_model.encoder.layers.4.self_attn.k_proj.bias', 'vision_model.encoder.layers.2.self_attn.v_proj.weight', 'vision_model.encoder.layers.18.self_attn.v_proj.weight', 'vision_model.encoder.layers.13.self_attn.v_proj.bias', 'vision_model.encoder.layers.3.self_attn.v_proj.weight', 'vision_model.encoder.layers.16.self_attn.out_proj.bias', 'vision_model.encoder.layers.22.self_attn.v_proj.bias', 'vision_model.encoder.layers.16.layer_norm2.bias', 'vision_model.encoder.layers.10.mlp.fc1.bias', 'vision_model.encoder.layers.2.layer_norm2.bias', 'vision_model.encoder.layers.15.mlp.fc1.bias', 'vision_model.encoder.layers.16.layer_norm1.bias', 'vision_model.encoder.layers.12.self_attn.k_proj.bias', 'vision_model.encoder.layers.7.layer_norm2.bias', 'vision_model.encoder.layers.1.self_attn.k_proj.weight', 'vision_model.encoder.layers.18.layer_norm2.weight', 'vision_model.encoder.layers.7.layer_norm1.weight', 'vision_model.encoder.layers.17.layer_norm1.bias', 'vision_model.encoder.layers.19.layer_norm1.weight', 'vision_model.encoder.layers.16.self_attn.out_proj.weight', 'vision_model.encoder.layers.21.self_attn.out_proj.bias', 'vision_model.encoder.layers.2.mlp.fc1.bias', 'vision_model.encoder.layers.13.mlp.fc2.weight', 'vision_model.encoder.layers.15.layer_norm1.bias', 'vision_model.encoder.layers.22.self_attn.out_proj.weight', 'vision_model.encoder.layers.11.self_attn.k_proj.weight', 'vision_model.encoder.layers.0.mlp.fc1.bias', 'vision_model.encoder.layers.23.layer_norm1.bias', 'vision_model.encoder.layers.17.self_attn.k_proj.bias', 'vision_model.encoder.layers.9.layer_norm1.bias', 'vision_model.encoder.layers.6.layer_norm1.weight', 'vision_model.encoder.layers.20.layer_norm1.bias', 'vision_model.encoder.layers.0.self_attn.out_proj.weight', 'vision_model.encoder.layers.3.mlp.fc1.weight', 'vision_model.encoder.layers.11.layer_norm2.weight', 'vision_model.encoder.layers.2.self_attn.k_proj.weight', 'vision_model.encoder.layers.20.mlp.fc1.bias', 'vision_model.encoder.layers.5.mlp.fc2.bias', 'vision_model.encoder.layers.10.self_attn.k_proj.bias', 'vision_model.encoder.layers.12.layer_norm1.bias', 'vision_model.encoder.layers.21.self_attn.v_proj.weight', 'vision_model.encoder.layers.18.layer_norm2.bias', 'vision_model.encoder.layers.13.self_attn.q_proj.weight', 'vision_model.encoder.layers.3.layer_norm2.bias', 'vision_model.encoder.layers.5.mlp.fc2.weight', 'vision_model.encoder.layers.5.self_attn.q_proj.bias', 'vision_model.encoder.layers.9.self_attn.q_proj.weight', 'vision_model.encoder.layers.19.self_attn.v_proj.bias', 'vision_model.encoder.layers.14.self_attn.q_proj.weight', 'vision_model.encoder.layers.2.layer_norm1.weight', 'vision_model.encoder.layers.20.self_attn.q_proj.bias', 'vision_model.encoder.layers.21.mlp.fc1.bias', 'vision_model.encoder.layers.2.layer_norm1.bias', 'vision_model.encoder.layers.6.mlp.fc2.weight', 'vision_model.encoder.layers.9.self_attn.out_proj.bias', 'vision_model.encoder.layers.16.mlp.fc2.weight', 'vision_model.encoder.layers.16.self_attn.q_proj.weight', 'vision_model.encoder.layers.21.self_attn.k_proj.bias', 'vision_model.encoder.layers.3.self_attn.q_proj.bias', 'vision_model.encoder.layers.8.self_attn.v_proj.bias', 'vision_model.encoder.layers.21.layer_norm2.weight', 'vision_model.encoder.layers.20.self_attn.out_proj.weight', 'vision_model.encoder.layers.2.layer_norm2.weight', 'vision_model.encoder.layers.21.self_attn.out_proj.weight', 'vision_model.encoder.layers.13.self_attn.k_proj.weight', 'vision_model.encoder.layers.15.self_attn.v_proj.weight', 'vision_model.encoder.layers.19.self_attn.q_proj.weight', 'vision_model.encoder.layers.23.self_attn.v_proj.bias', 'vision_model.encoder.layers.14.self_attn.k_proj.weight', 'vision_model.encoder.layers.17.mlp.fc2.bias', 'vision_model.encoder.layers.22.self_attn.q_proj.bias', 'vision_model.encoder.layers.8.mlp.fc1.bias', 'vision_model.encoder.layers.17.self_attn.v_proj.bias', 'vision_model.encoder.layers.22.layer_norm2.bias', 'vision_model.encoder.layers.22.mlp.fc2.bias', 'vision_model.encoder.layers.18.mlp.fc2.weight', 'vision_model.encoder.layers.11.self_attn.k_proj.bias', 'vision_model.encoder.layers.0.self_attn.out_proj.bias', 'vision_model.embeddings.class_embedding', 'vision_model.encoder.layers.9.self_attn.v_proj.bias', 'vision_model.encoder.layers.10.layer_norm1.bias', 'vision_model.encoder.layers.11.mlp.fc2.weight', 'vision_model.encoder.layers.1.self_attn.v_proj.bias', 'vision_model.encoder.layers.3.mlp.fc1.bias', 'vision_model.encoder.layers.15.self_attn.out_proj.weight', 'vision_model.encoder.layers.1.layer_norm2.weight', 'vision_model.encoder.layers.12.mlp.fc2.weight', 'vision_model.encoder.layers.15.layer_norm2.bias', 'vision_model.encoder.layers.10.self_attn.q_proj.weight', 'vision_model.encoder.layers.11.self_attn.v_proj.weight', 'vision_model.encoder.layers.23.self_attn.v_proj.weight', 'vision_model.encoder.layers.5.self_attn.v_proj.weight', 'vision_model.encoder.layers.9.layer_norm2.bias', 'vision_model.encoder.layers.1.mlp.fc2.weight', 'vision_model.encoder.layers.5.layer_norm1.weight', 'vision_model.encoder.layers.21.self_attn.k_proj.weight', 'vision_model.encoder.layers.17.self_attn.v_proj.weight', 'vision_model.encoder.layers.10.layer_norm2.weight', 'vision_model.encoder.layers.23.layer_norm2.bias', 'vision_model.encoder.layers.7.self_attn.k_proj.bias', 'vision_model.encoder.layers.13.mlp.fc1.weight', 'vision_model.encoder.layers.0.mlp.fc1.weight', 'vision_model.encoder.layers.16.self_attn.v_proj.weight', 'vision_model.encoder.layers.20.self_attn.v_proj.bias', 'vision_model.encoder.layers.10.self_attn.q_proj.bias', 'vision_model.encoder.layers.15.layer_norm1.weight', 'vision_model.encoder.layers.7.self_attn.v_proj.bias', 'vision_model.encoder.layers.0.mlp.fc2.bias', 'vision_model.encoder.layers.15.self_attn.k_proj.bias', 'vision_model.encoder.layers.4.self_attn.v_proj.bias', 'vision_model.encoder.layers.11.layer_norm2.bias', 'vision_model.encoder.layers.10.layer_norm2.bias', 'visual_projection.weight', 'vision_model.encoder.layers.1.mlp.fc1.weight', 'vision_model.encoder.layers.6.self_attn.v_proj.bias', 'vision_model.encoder.layers.11.self_attn.q_proj.weight', 'vision_model.encoder.layers.11.self_attn.out_proj.bias', 'vision_model.encoder.layers.0.self_attn.q_proj.weight', 'vision_model.encoder.layers.6.self_attn.out_proj.weight', 'vision_model.encoder.layers.8.self_attn.k_proj.weight', 'vision_model.encoder.layers.2.self_attn.q_proj.bias', 'vision_model.encoder.layers.17.self_attn.k_proj.weight', 'vision_model.encoder.layers.22.self_attn.v_proj.weight', 'vision_model.encoder.layers.23.self_attn.q_proj.weight', 'vision_model.encoder.layers.15.self_attn.k_proj.weight', 'vision_model.encoder.layers.19.layer_norm2.bias', 'vision_model.encoder.layers.7.mlp.fc2.bias', 'vision_model.encoder.layers.23.self_attn.q_proj.bias', 'vision_model.encoder.layers.17.layer_norm2.weight', 'vision_model.encoder.layers.11.layer_norm1.weight', 'vision_model.encoder.layers.22.self_attn.k_proj.bias', 'vision_model.encoder.layers.23.mlp.fc2.bias', 'vision_model.encoder.layers.0.layer_norm2.weight', 'vision_model.encoder.layers.8.layer_norm2.weight', 'vision_model.encoder.layers.8.layer_norm1.bias', 'vision_model.encoder.layers.4.mlp.fc2.weight', 'vision_model.encoder.layers.5.self_attn.out_proj.weight', 'vision_model.encoder.layers.21.layer_norm1.weight', 'vision_model.post_layernorm.weight', 'vision_model.encoder.layers.1.self_attn.out_proj.bias', 'vision_model.encoder.layers.0.self_attn.q_proj.bias', 'vision_model.encoder.layers.21.mlp.fc2.weight', 'vision_model.encoder.layers.18.self_attn.k_proj.bias', 'vision_model.encoder.layers.20.mlp.fc1.weight', 'vision_model.encoder.layers.15.mlp.fc2.weight', 'logit_scale', 'vision_model.encoder.layers.1.self_attn.q_proj.weight', 'vision_model.encoder.layers.5.mlp.fc1.bias', 'vision_model.encoder.layers.16.self_attn.k_proj.bias', 'vision_model.encoder.layers.12.self_attn.out_proj.weight', 'vision_model.encoder.layers.10.self_attn.v_proj.bias', 'vision_model.encoder.layers.11.mlp.fc2.bias', 'vision_model.encoder.layers.12.layer_norm2.weight', 'vision_model.encoder.layers.16.self_attn.q_proj.bias', 'vision_model.encoder.layers.6.self_attn.k_proj.weight', 'vision_model.encoder.layers.5.mlp.fc1.weight', 'vision_model.encoder.layers.7.mlp.fc2.weight', 'vision_model.encoder.layers.4.self_attn.out_proj.weight', 'vision_model.encoder.layers.16.mlp.fc2.bias', 'vision_model.encoder.layers.1.layer_norm2.bias', 'vision_model.encoder.layers.13.self_attn.k_proj.bias', 'vision_model.encoder.layers.21.layer_norm1.bias', 'vision_model.encoder.layers.17.mlp.fc1.bias', 'vision_model.encoder.layers.5.layer_norm2.weight', 'vision_model.encoder.layers.8.layer_norm1.weight', 'vision_model.encoder.layers.11.mlp.fc1.weight', 'vision_model.encoder.layers.19.mlp.fc1.bias', 'vision_model.encoder.layers.0.self_attn.k_proj.weight', 'vision_model.encoder.layers.10.self_attn.out_proj.bias', 'vision_model.encoder.layers.13.self_attn.q_proj.bias', 'vision_model.encoder.layers.7.self_attn.out_proj.weight', 'vision_model.encoder.layers.4.self_attn.q_proj.bias', 'vision_model.encoder.layers.22.mlp.fc1.weight', 'vision_model.encoder.layers.20.mlp.fc2.weight', 'vision_model.encoder.layers.20.layer_norm2.bias', 'vision_model.encoder.layers.13.layer_norm1.weight', 'vision_model.encoder.layers.4.layer_norm1.weight', 'vision_model.encoder.layers.6.self_attn.out_proj.bias', 'vision_model.encoder.layers.9.self_attn.out_proj.weight', 'vision_model.encoder.layers.14.mlp.fc2.weight', 'vision_model.encoder.layers.9.self_attn.k_proj.weight', 'vision_model.encoder.layers.18.self_attn.out_proj.bias', 'vision_model.encoder.layers.11.self_attn.v_proj.bias', 'vision_model.encoder.layers.13.self_attn.out_proj.weight', 'vision_model.encoder.layers.19.mlp.fc1.weight', 'vision_model.encoder.layers.16.layer_norm1.weight', 'vision_model.embeddings.position_ids', 'vision_model.encoder.layers.3.self_attn.out_proj.bias', 'vision_model.encoder.layers.20.self_attn.k_proj.weight', 'vision_model.encoder.layers.20.layer_norm2.weight', 'vision_model.encoder.layers.22.mlp.fc2.weight', 'vision_model.encoder.layers.14.self_attn.k_proj.bias', 'vision_model.encoder.layers.15.self_attn.q_proj.weight', 'vision_model.encoder.layers.19.mlp.fc2.bias', 'vision_model.encoder.layers.4.mlp.fc1.weight', 'vision_model.encoder.layers.20.self_attn.q_proj.weight', 'vision_model.encoder.layers.1.layer_norm1.weight', 'vision_model.encoder.layers.8.self_attn.out_proj.weight', 'vision_model.encoder.layers.0.layer_norm1.bias', 'vision_model.encoder.layers.3.layer_norm1.bias', 'vision_model.encoder.layers.7.mlp.fc1.bias', 'vision_model.pre_layrnorm.weight', 'vision_model.encoder.layers.1.layer_norm1.bias', 'vision_model.encoder.layers.5.layer_norm2.bias', 'vision_model.encoder.layers.20.mlp.fc2.bias', 'vision_model.encoder.layers.0.self_attn.v_proj.weight', 'vision_model.encoder.layers.0.layer_norm2.bias', 'vision_model.encoder.layers.15.self_attn.q_proj.bias', 'vision_model.encoder.layers.19.layer_norm2.weight', 'vision_model.encoder.layers.4.self_attn.v_proj.weight', 'vision_model.encoder.layers.13.layer_norm1.bias', 'vision_model.encoder.layers.13.layer_norm2.weight', 'vision_model.encoder.layers.2.self_attn.k_proj.bias', 'vision_model.encoder.layers.12.self_attn.q_proj.bias', 'vision_model.encoder.layers.14.layer_norm2.bias', 'vision_model.encoder.layers.23.mlp.fc1.bias', 'vision_model.encoder.layers.1.mlp.fc2.bias', 'vision_model.encoder.layers.22.self_attn.k_proj.weight', 'vision_model.encoder.layers.7.layer_norm1.bias', 'vision_model.encoder.layers.10.mlp.fc2.weight', 'vision_model.encoder.layers.14.self_attn.v_proj.bias', 'vision_model.encoder.layers.8.self_attn.q_proj.weight', 'vision_model.encoder.layers.19.self_attn.k_proj.weight', 'vision_model.pre_layrnorm.bias', 'vision_model.encoder.layers.16.mlp.fc1.bias', 'vision_model.encoder.layers.18.mlp.fc1.bias', 'vision_model.encoder.layers.23.self_attn.out_proj.weight', 'vision_model.embeddings.position_embedding.weight', 'vision_model.encoder.layers.6.mlp.fc1.weight', 'vision_model.encoder.layers.13.layer_norm2.bias', 'vision_model.encoder.layers.16.self_attn.v_proj.bias', 'vision_model.encoder.layers.18.mlp.fc1.weight', 'vision_model.encoder.layers.13.mlp.fc1.bias', 'vision_model.encoder.layers.14.layer_norm1.bias', 'vision_model.encoder.layers.21.mlp.fc2.bias', 'vision_model.encoder.layers.12.self_attn.k_proj.weight', 'vision_model.encoder.layers.7.mlp.fc1.weight', 'vision_model.encoder.layers.16.layer_norm2.weight', 'vision_model.encoder.layers.2.self_attn.out_proj.bias', 'vision_model.encoder.layers.6.layer_norm2.weight', 'vision_model.encoder.layers.17.mlp.fc1.weight', 'vision_model.encoder.layers.10.self_attn.k_proj.weight', 'vision_model.post_layernorm.bias', 'vision_model.encoder.layers.21.self_attn.q_proj.bias', 'vision_model.encoder.layers.3.layer_norm1.weight', 'vision_model.encoder.layers.6.self_attn.q_proj.bias', 'vision_model.encoder.layers.6.mlp.fc1.bias', 'vision_model.encoder.layers.9.mlp.fc2.bias', 'vision_model.encoder.layers.23.layer_norm2.weight', 'vision_model.encoder.layers.3.self_attn.k_proj.weight', 'vision_model.encoder.layers.2.self_attn.q_proj.weight', 'vision_model.encoder.layers.16.mlp.fc1.weight', 'vision_model.encoder.layers.4.mlp.fc2.bias', 'vision_model.encoder.layers.9.layer_norm2.weight', 'vision_model.encoder.layers.12.self_attn.q_proj.weight', 'vision_model.encoder.layers.8.self_attn.out_proj.bias', 'vision_model.encoder.layers.7.self_attn.q_proj.bias', 'vision_model.encoder.layers.15.mlp.fc1.weight', 'vision_model.encoder.layers.9.layer_norm1.weight', 'vision_model.encoder.layers.19.self_attn.out_proj.bias', 'vision_model.encoder.layers.8.self_attn.q_proj.bias', 'vision_model.encoder.layers.0.mlp.fc2.weight', 'vision_model.encoder.layers.8.mlp.fc1.weight', 'vision_model.encoder.layers.5.self_attn.k_proj.bias', 'vision_model.encoder.layers.14.self_attn.q_proj.bias', 'vision_model.encoder.layers.18.self_attn.out_proj.weight', 'text_projection.weight', 'vision_model.encoder.layers.6.self_attn.v_proj.weight', 'vision_model.embeddings.patch_embedding.weight', 'vision_model.encoder.layers.9.self_attn.k_proj.bias', 'vision_model.encoder.layers.20.self_attn.out_proj.bias', 'vision_model.encoder.layers.6.self_attn.k_proj.bias', 'vision_model.encoder.layers.14.mlp.fc1.bias', 'vision_model.encoder.layers.11.layer_norm1.bias', 'vision_model.encoder.layers.0.layer_norm1.weight', 'vision_model.encoder.layers.1.self_attn.out_proj.weight', 'vision_model.encoder.layers.8.mlp.fc2.weight', 'vision_model.encoder.layers.3.self_attn.out_proj.weight', 'vision_model.encoder.layers.14.layer_norm1.weight', 'vision_model.encoder.layers.8.self_attn.k_proj.bias', 'vision_model.encoder.layers.9.self_attn.q_proj.bias', 'vision_model.encoder.layers.13.mlp.fc2.bias', 'vision_model.encoder.layers.5.self_attn.q_proj.weight', 'vision_model.encoder.layers.22.mlp.fc1.bias', 'vision_model.encoder.layers.9.mlp.fc1.weight', 'vision_model.encoder.layers.3.mlp.fc2.bias', 'vision_model.encoder.layers.6.mlp.fc2.bias', 'vision_model.encoder.layers.14.self_attn.v_proj.weight', 'vision_model.encoder.layers.19.self_attn.q_proj.bias', 'vision_model.encoder.layers.14.self_attn.out_proj.bias', 'vision_model.encoder.layers.21.self_attn.v_proj.bias', 'vision_model.encoder.layers.17.self_attn.out_proj.bias']
- This IS expected if you are initializing CLIPTextModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing CLIPTextModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Creating invisible watermark encoder (see https://github.com/ShieldMnt/invisible-watermark)...
Sampling:   0%|                                                                                   | 0/2 [00:00<?, ?it/sData shape for PLMS sampling is (3, 4, 64, 64)                                                     | 0/1 [00:00<?, ?it/s]
Running PLMS Sampling with 50 timesteps
PLMS Sampler:   0%|                                                                              | 0/50 [00:06<?, ?it/s]
data:   0%|                                                                                       | 0/1 [00:46<?, ?it/s]
Sampling:   0%|                                                                                   | 0/2 [00:46<?, ?it/s]
Traceback (most recent call last):
  File "/Users/redstar16/project/akaboshinit/clone/stable-diffusion/scripts/txt2img.py", line 355, in <module>
    main()
  File "/Users/redstar16/project/akaboshinit/clone/stable-diffusion/scripts/txt2img.py", line 306, in main
    samples_ddim, _ = sampler.sample(S=opt.ddim_steps,
  File "/Users/redstar16/.asdf/installs/python/miniforge3-4.10.1-5/envs/ldm/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
    return func(*args, **kwargs)
  File "/Users/redstar16/project/akaboshinit/clone/stable-diffusion/ldm/models/diffusion/plms.py", line 103, in sample
    samples, intermediates = self.plms_sampling(conditioning, size,
  File "/Users/redstar16/.asdf/installs/python/miniforge3-4.10.1-5/envs/ldm/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
    return func(*args, **kwargs)
  File "/Users/redstar16/project/akaboshinit/clone/stable-diffusion/ldm/models/diffusion/plms.py", line 158, in plms_sampling
    outs = self.p_sample_plms(img, cond, ts, index=index, use_original_steps=ddim_use_original_steps,
  File "/Users/redstar16/.asdf/installs/python/miniforge3-4.10.1-5/envs/ldm/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
    return func(*args, **kwargs)
  File "/Users/redstar16/project/akaboshinit/clone/stable-diffusion/ldm/models/diffusion/plms.py", line 224, in p_sample_plms
    e_t = get_model_output(x, t)
  File "/Users/redstar16/project/akaboshinit/clone/stable-diffusion/ldm/models/diffusion/plms.py", line 191, in get_model_output
    e_t_uncond, e_t = self.model.apply_model(x_in, t_in, c_in).chunk(2)
  File "/Users/redstar16/project/akaboshinit/clone/stable-diffusion/ldm/models/diffusion/ddpm.py", line 987, in apply_model
    x_recon = self.model(x_noisy, t, **cond)
  File "/Users/redstar16/.asdf/installs/python/miniforge3-4.10.1-5/envs/ldm/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1190, in _call_impl
    return forward_call(*input, **kwargs)
  File "/Users/redstar16/project/akaboshinit/clone/stable-diffusion/ldm/models/diffusion/ddpm.py", line 1410, in forward
    out = self.diffusion_model(x, t, context=cc)
  File "/Users/redstar16/.asdf/installs/python/miniforge3-4.10.1-5/envs/ldm/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1190, in _call_impl
    return forward_call(*input, **kwargs)
  File "/Users/redstar16/project/akaboshinit/clone/stable-diffusion/ldm/modules/diffusionmodules/openaimodel.py", line 732, in forward
    h = module(h, emb, context)
  File "/Users/redstar16/.asdf/installs/python/miniforge3-4.10.1-5/envs/ldm/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1190, in _call_impl
    return forward_call(*input, **kwargs)
  File "/Users/redstar16/project/akaboshinit/clone/stable-diffusion/ldm/modules/diffusionmodules/openaimodel.py", line 85, in forward
    x = layer(x, context)
  File "/Users/redstar16/.asdf/installs/python/miniforge3-4.10.1-5/envs/ldm/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1190, in _call_impl
    return forward_call(*input, **kwargs)
  File "/Users/redstar16/project/akaboshinit/clone/stable-diffusion/ldm/modules/attention.py", line 254, in forward
    x = self.norm(x)
  File "/Users/redstar16/.asdf/installs/python/miniforge3-4.10.1-5/envs/ldm/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1190, in _call_impl
    return forward_call(*input, **kwargs)
  File "/Users/redstar16/.asdf/installs/python/miniforge3-4.10.1-5/envs/ldm/lib/python3.10/site-packages/torch/nn/modules/normalization.py", line 273, in forward
    return F.group_norm(
  File "/Users/redstar16/.asdf/installs/python/miniforge3-4.10.1-5/envs/ldm/lib/python3.10/site-packages/torch/nn/functional.py", line 2526, in group_norm
    return torch.group_norm(input, num_groups, weight, bias, eps, torch.backends.cudnn.enabled)
RuntimeError: expected scalar type BFloat16 but found Float
Birch-san commented 2 years ago

@akaboshinit

RuntimeError: expected scalar type BFloat16 but found Float

this happened to me when I built PyTorch from source from Raymonf's fork of kulinseth's branch.

fixed it by upgrading PyTorch to a latest nightly release.

akaboshinit commented 2 years ago

@Birch-san Thanks for the reply. I thought I'd done it myself just in case, but it's the one below, right?

cmd:

conda install pytorch torchvision torchaudio -c pytorch-nightly

version:

% python3 -c 'import torch; print(torch.__version__) '
1.13.0.dev20220823
Birch-san commented 2 years ago

yes. that's what I get too.

(ldm) ➜  stable-diffusion git:(apple-silicon-mps-support) ✗ python3 -c 'import torch; print(torch.__version__) '
1.13.0.dev20220823

I'm not sure then what the problem could be. you've activated the right conda env right? your failed python script was running with the same environment as the one which is printing this torch version?

and you're running from this branch?
https://github.com/magnusviri/stable-diffusion/tree/apple-silicon-mps-support

junukwon7 commented 2 years ago

@magnusviri The instructions you provided somehow doesn't work in my environment.

Traceback (most recent call last):
  File "/Users/username/Desktop/stable-diffusion/scripts/txt2img.py", line 8, in <module>
    from imwatermark import WatermarkEncoder
ModuleNotFoundError: No module named 'imwatermark'
  1. pip install imwatermark returns that it's already satisfied
  2. pip install invisible-watermark returns error, since onnx doesn't support python 3.10 yet. (at least can build from source)

Is there any reason to change to python 3.10? And I'm also curious how you got imwatermark functional.

Thank you for your work

cgodley commented 2 years ago

I switched to python 3.10 because it was the first thing I tried that solved some conda conflicts. It's probably possible to resolve them some other way without changing the python version

marcfon commented 2 years ago

I had exactly the same issue yesterday and managed to fix it while staying on 3.10.4.

The solution is not to try and install onnx / onnxruntime manually, which I was doing after getting the error.

I not entirely sure anymore but I think these commands fixed it for me pip install transformers==4.19.2 diffusers invisible-watermark and/or conda install pytorch torchvision -c pytorch

@magnusviri The instructions you provided somehow doesn't work in my environment.

Traceback (most recent call last):
  File "/Users/username/Desktop/stable-diffusion/scripts/txt2img.py", line 8, in <module>
    from imwatermark import WatermarkEncoder
ModuleNotFoundError: No module named 'imwatermark'
  1. pip install imwatermark returns that it's already satisfied
  2. pip install invisible-watermark returns error, since onnx doesn't support python 3.10 yet. (at least can build from source)

Is there any reason to change to python 3.10? And I'm also curious how you got imwatermark functional.

Thank you for your work

Birch-san commented 2 years ago

@junukwon7

@magnusviri The instructions you provided somehow doesn't work in my environment.

Traceback (most recent call last):
  File "/Users/username/Desktop/stable-diffusion/scripts/txt2img.py", line 8, in <module>
    from imwatermark import WatermarkEncoder
ModuleNotFoundError: No module named 'imwatermark'

oh yeah I forgot to mention. this happened to me too (including the problems trying to build onnx from source).
I just modified txt2img.py and commented-out anything related to WatermarkEncoder. because I can live without watermarks on my images 😛

cgodley commented 2 years ago

Error: product of dimension sizes > 2**31

@junukwon7 This also happens to me if I set the resolution too high for the amount of video memory on my machine.

Setting --n_samples 1 reduces the memory usage of txt2img so there might be a chance that you can handle a higher resolution using that setting.

junukwon7 commented 2 years ago

Finally got why I got errors which didn't happen before! Yesterday I cloned einanao's repo and it had no watermark, since the watermark was added two days ago. However, magnusviri's repo has it. @Birch-san your solution looks both legit and simple :)

junukwon7 commented 2 years ago

Error: product of dimension sizes > 2**31

@junukwon7 This also happens to me if I set the resolution too high for the amount of video memory on my machine.

Setting --n_samples 1 reduces the memory usage of txt2img so there might be a chance that you can handle a higher resolution using that setting.

@cgodley @Birch-san Thanks for the reply. However I've already set my n_samples to 1...

EDIT: Looks like the maximum vram for mps is not 64GB - somewhere around 16? It also makes sense that insufficient vram lead to the exception. Would need some more research.

Also, since my vram has much space (64G) while the process uses 14GB, and the error states that some number exceeded 2**31 which is INT_MAX, I think it's some kind of a soft-limit which means having more vram won't help :(

Birch-san commented 2 years ago

sounds like the limit comes from the fact that something's being computed in 32-bit?

saibakatar commented 2 years ago

Hey everyone, following the steps in @cgodley 's comment and some additions in @HenkPoley 's comments. After doing that, I'm getting the following error when I finally give the prompt: zsh: segmentation fault python scripts/txt2img.py --prompt

Any suggestions please?

junukwon7 commented 2 years ago

Hey everyone, following the steps in @cgodley 's comment and some additions in @HenkPoley 's comments. After doing that, I'm getting the following error when I finally give the prompt: zsh: segmentation fault python scripts/txt2img.py --prompt

Any suggestions please?

@saibakatar Could you provide full error log and your command?

saibakatar commented 2 years ago

Hey everyone, following the steps in @cgodley 's comment and some additions in @HenkPoley 's comments. After doing that, I'm getting the following error when I finally give the prompt: zsh: segmentation fault python scripts/txt2img.py --prompt Any suggestions please?

@saibakatar Could you provide full error log and your command?

Hi @junukwon7 , it's as follows:

stable-diffusion % python scripts/txt2img.py --prompt "a photograph of an astronaut riding a horse" --plms --seed 222
zsh: segmentation fault  python scripts/txt2img.py --prompt  --plms --seed 222
junukwon7 commented 2 years ago

@saibakatar Although I'm not sure what triggered the segmentation fault, I can still provide some general solutions.

  1. Clear your memory. If your mac has 8GB of RAM, it might be a harsh environment for the model to run.
  2. Try adding the flags --n_samples 1 --n_rows 1 --n_iter 1 . These flags will reduce memory consumption.

Hope these would help.

saibakatar commented 2 years ago

2. --n_samples 1 --n_rows 1 --n_iter 1

Thanks. The issue continued after including the flags above. My RAM on M1 is 64 GB so probably not that. Will just try again from scratch I guess. Thanks again.

akaboshinit commented 2 years ago

@junukwon7 help me... How do I fix it?

(ldm) redstar16: ~/project/akaboshinit/clone/stable-diffusion (apple-silicon-mps-support|✚1…)
% python scripts/txt2img.py --prompt "a photograph of an astronaut riding a horse" --plms --n_samples 1 --n_rows 1 --n_iter 1
Global seed set to 42
Loading model from models/ldm/stable-diffusion-v1/model.ckpt
Global Step: 470000
LatentDiffusion: Running in eps-prediction mode
DiffusionWrapper has 859.52 M params.
making attention of type 'vanilla' with 512 in_channels
Working with z of shape (1, 4, 32, 32) = 4096 dimensions.
making attention of type 'vanilla' with 512 in_channels
Some weights of the model checkpoint at openai/clip-vit-large-patch14 were not used when initializing CLIPTextModel: ['vision_model.encoder.layers.10.self_attn.k_proj.weight', 'vision_model.encoder.layers.6.layer_norm1.bias', 'vision_model.encoder.layers.13.layer_norm1.weight', 'vision_model.encoder.layers.22.self_attn.k_proj.bias', 'vision_model.encoder.layers.12.self_attn.out_proj.weight', 'vision_model.encoder.layers.22.mlp.fc1.bias', 'vision_model.encoder.layers.5.mlp.fc1.weight', 'vision_model.encoder.layers.11.self_attn.v_proj.bias', 'vision_model.encoder.layers.23.self_attn.v_proj.bias', 'vision_model.encoder.layers.19.mlp.fc1.bias', 'vision_model.encoder.layers.6.self_attn.q_proj.bias', 'vision_model.encoder.layers.18.mlp.fc2.weight', 'vision_model.encoder.layers.11.self_attn.out_proj.weight', 'vision_model.encoder.layers.12.mlp.fc2.bias', 'vision_model.encoder.layers.16.layer_norm1.bias', 'vision_model.encoder.layers.19.self_attn.v_proj.weight', 'vision_model.encoder.layers.16.self_attn.v_proj.weight', 'vision_model.encoder.layers.0.self_attn.v_proj.weight', 'vision_model.encoder.layers.1.self_attn.out_proj.weight', 'vision_model.encoder.layers.0.layer_norm2.weight', 'vision_model.encoder.layers.17.mlp.fc2.weight', 'vision_model.encoder.layers.14.self_attn.v_proj.weight', 'vision_model.encoder.layers.22.layer_norm1.bias', 'vision_model.encoder.layers.23.mlp.fc2.weight', 'vision_model.encoder.layers.14.self_attn.v_proj.bias', 'vision_model.encoder.layers.20.layer_norm2.bias', 'vision_model.encoder.layers.0.self_attn.out_proj.weight', 'vision_model.encoder.layers.7.self_attn.q_proj.weight', 'vision_model.encoder.layers.4.self_attn.k_proj.bias', 'vision_model.encoder.layers.20.self_attn.q_proj.bias', 'vision_model.encoder.layers.6.mlp.fc2.bias', 'vision_model.encoder.layers.17.self_attn.q_proj.bias', 'vision_model.encoder.layers.7.self_attn.k_proj.bias', 'vision_model.encoder.layers.18.self_attn.k_proj.bias', 'vision_model.encoder.layers.2.layer_norm2.bias', 'vision_model.encoder.layers.1.self_attn.out_proj.bias', 'vision_model.encoder.layers.7.mlp.fc2.weight', 'vision_model.encoder.layers.10.layer_norm2.weight', 'vision_model.encoder.layers.4.layer_norm2.bias', 'vision_model.encoder.layers.0.layer_norm1.weight', 'vision_model.encoder.layers.19.mlp.fc1.weight', 'vision_model.encoder.layers.11.layer_norm1.weight', 'vision_model.encoder.layers.15.mlp.fc1.weight', 'vision_model.encoder.layers.19.layer_norm2.weight', 'vision_model.encoder.layers.10.self_attn.v_proj.weight', 'vision_model.encoder.layers.8.layer_norm2.weight', 'vision_model.encoder.layers.9.self_attn.v_proj.bias', 'vision_model.encoder.layers.6.mlp.fc1.bias', 'vision_model.encoder.layers.22.layer_norm2.bias', 'vision_model.encoder.layers.5.self_attn.k_proj.bias', 'vision_model.encoder.layers.15.self_attn.k_proj.weight', 'vision_model.encoder.layers.21.self_attn.v_proj.weight', 'vision_model.encoder.layers.5.layer_norm2.bias', 'vision_model.encoder.layers.3.layer_norm1.bias', 'vision_model.encoder.layers.20.self_attn.k_proj.bias', 'vision_model.encoder.layers.2.self_attn.q_proj.weight', 'vision_model.encoder.layers.4.self_attn.q_proj.weight', 'vision_model.encoder.layers.18.self_attn.v_proj.weight', 'vision_model.encoder.layers.7.self_attn.k_proj.weight', 'vision_model.encoder.layers.18.self_attn.q_proj.weight', 'vision_model.encoder.layers.23.self_attn.out_proj.weight', 'vision_model.encoder.layers.6.self_attn.q_proj.weight', 'vision_model.encoder.layers.21.self_attn.q_proj.weight', 'vision_model.encoder.layers.12.mlp.fc1.weight', 'vision_model.encoder.layers.15.self_attn.q_proj.weight', 'vision_model.encoder.layers.3.self_attn.out_proj.weight', 'vision_model.encoder.layers.11.mlp.fc2.weight', 'vision_model.encoder.layers.2.self_attn.out_proj.bias', 'vision_model.encoder.layers.9.mlp.fc1.bias', 'vision_model.encoder.layers.18.mlp.fc2.bias', 'vision_model.encoder.layers.22.mlp.fc2.bias', 'vision_model.encoder.layers.11.layer_norm2.weight', 'vision_model.encoder.layers.3.layer_norm2.bias', 'vision_model.encoder.layers.5.self_attn.q_proj.bias', 'vision_model.encoder.layers.12.self_attn.out_proj.bias', 'vision_model.encoder.layers.7.layer_norm1.bias', 'vision_model.encoder.layers.13.self_attn.v_proj.bias', 'vision_model.encoder.layers.19.self_attn.k_proj.weight', 'vision_model.encoder.layers.17.layer_norm1.bias', 'vision_model.encoder.layers.16.mlp.fc2.bias', 'vision_model.encoder.layers.4.self_attn.k_proj.weight', 'vision_model.encoder.layers.19.self_attn.q_proj.bias', 'vision_model.encoder.layers.3.mlp.fc1.bias', 'vision_model.encoder.layers.10.self_attn.v_proj.bias', 'vision_model.encoder.layers.1.mlp.fc1.weight', 'vision_model.encoder.layers.14.self_attn.q_proj.weight', 'vision_model.encoder.layers.0.self_attn.q_proj.bias', 'vision_model.encoder.layers.14.self_attn.k_proj.weight', 'vision_model.encoder.layers.1.mlp.fc1.bias', 'vision_model.encoder.layers.23.layer_norm1.weight', 'vision_model.encoder.layers.8.mlp.fc1.weight', 'vision_model.encoder.layers.9.mlp.fc1.weight', 'vision_model.encoder.layers.18.mlp.fc1.weight', 'vision_model.encoder.layers.22.mlp.fc2.weight', 'vision_model.post_layernorm.bias', 'vision_model.encoder.layers.2.mlp.fc2.weight', 'vision_model.encoder.layers.21.mlp.fc1.weight', 'vision_model.encoder.layers.2.self_attn.v_proj.weight', 'vision_model.encoder.layers.22.self_attn.k_proj.weight', 'vision_model.encoder.layers.23.mlp.fc2.bias', 'vision_model.encoder.layers.20.layer_norm2.weight', 'vision_model.encoder.layers.13.mlp.fc1.bias', 'vision_model.encoder.layers.16.self_attn.q_proj.bias', 'vision_model.encoder.layers.8.self_attn.v_proj.bias', 'vision_model.encoder.layers.20.mlp.fc2.weight', 'vision_model.encoder.layers.22.self_attn.out_proj.weight', 'vision_model.encoder.layers.4.layer_norm1.weight', 'vision_model.encoder.layers.2.self_attn.out_proj.weight', 'vision_model.encoder.layers.10.self_attn.out_proj.bias', 'vision_model.encoder.layers.16.layer_norm2.weight', 'vision_model.encoder.layers.4.mlp.fc2.weight', 'vision_model.encoder.layers.19.self_attn.out_proj.weight', 'vision_model.encoder.layers.1.layer_norm2.weight', 'vision_model.encoder.layers.17.self_attn.v_proj.weight', 'vision_model.encoder.layers.0.mlp.fc1.weight', 'vision_model.encoder.layers.21.self_attn.v_proj.bias', 'vision_model.encoder.layers.6.mlp.fc2.weight', 'vision_model.encoder.layers.14.self_attn.out_proj.bias', 'vision_model.encoder.layers.17.self_attn.q_proj.weight', 'vision_model.encoder.layers.17.layer_norm2.weight', 'vision_model.encoder.layers.3.mlp.fc2.bias', 'vision_model.encoder.layers.20.self_attn.out_proj.weight', 'vision_model.encoder.layers.17.self_attn.k_proj.bias', 'vision_model.encoder.layers.19.layer_norm2.bias', 'vision_model.encoder.layers.4.self_attn.v_proj.bias', 'vision_model.encoder.layers.14.self_attn.out_proj.weight', 'vision_model.embeddings.class_embedding', 'vision_model.encoder.layers.7.mlp.fc1.bias', 'vision_model.encoder.layers.5.mlp.fc2.weight', 'vision_model.encoder.layers.7.layer_norm2.weight', 'vision_model.encoder.layers.11.self_attn.k_proj.bias', 'vision_model.encoder.layers.17.layer_norm1.weight', 'vision_model.encoder.layers.15.self_attn.v_proj.weight', 'vision_model.encoder.layers.10.mlp.fc2.weight', 'vision_model.encoder.layers.22.mlp.fc1.weight', 'vision_model.encoder.layers.13.self_attn.k_proj.bias', 'vision_model.encoder.layers.17.mlp.fc1.bias', 'vision_model.encoder.layers.8.self_attn.q_proj.weight', 'vision_model.encoder.layers.21.mlp.fc1.bias', 'vision_model.encoder.layers.10.self_attn.q_proj.weight', 'vision_model.encoder.layers.18.self_attn.out_proj.bias', 'vision_model.encoder.layers.0.self_attn.k_proj.weight', 'vision_model.encoder.layers.8.self_attn.q_proj.bias', 'vision_model.encoder.layers.2.layer_norm1.weight', 'vision_model.encoder.layers.18.layer_norm2.bias', 'vision_model.encoder.layers.19.self_attn.q_proj.weight', 'vision_model.encoder.layers.20.self_attn.v_proj.bias', 'vision_model.encoder.layers.22.self_attn.v_proj.weight', 'vision_model.encoder.layers.15.layer_norm1.weight', 'vision_model.encoder.layers.9.layer_norm2.weight', 'vision_model.encoder.layers.18.mlp.fc1.bias', 'vision_model.encoder.layers.15.self_attn.out_proj.weight', 'vision_model.encoder.layers.12.self_attn.q_proj.weight', 'vision_model.encoder.layers.22.layer_norm2.weight', 'vision_model.encoder.layers.14.self_attn.q_proj.bias', 'vision_model.encoder.layers.20.mlp.fc2.bias', 'vision_model.encoder.layers.11.self_attn.q_proj.bias', 'vision_model.encoder.layers.21.self_attn.q_proj.bias', 'vision_model.encoder.layers.2.self_attn.k_proj.bias', 'vision_model.encoder.layers.6.layer_norm2.weight', 'vision_model.encoder.layers.5.self_attn.v_proj.bias', 'vision_model.encoder.layers.4.self_attn.q_proj.bias', 'vision_model.encoder.layers.11.self_attn.v_proj.weight', 'vision_model.encoder.layers.17.self_attn.out_proj.weight', 'vision_model.encoder.layers.13.self_attn.out_proj.bias', 'vision_model.encoder.layers.0.self_attn.out_proj.bias', 'vision_model.encoder.layers.7.self_attn.v_proj.bias', 'vision_model.encoder.layers.23.layer_norm2.weight', 'vision_model.encoder.layers.3.self_attn.k_proj.weight', 'vision_model.encoder.layers.10.mlp.fc1.weight', 'vision_model.encoder.layers.5.mlp.fc1.bias', 'vision_model.encoder.layers.4.mlp.fc1.weight', 'vision_model.embeddings.position_ids', 'vision_model.encoder.layers.11.layer_norm2.bias', 'vision_model.pre_layrnorm.weight', 'vision_model.encoder.layers.7.mlp.fc2.bias', 'vision_model.encoder.layers.19.layer_norm1.weight', 'vision_model.encoder.layers.4.self_attn.v_proj.weight', 'vision_model.encoder.layers.13.self_attn.q_proj.bias', 'vision_model.encoder.layers.21.layer_norm2.weight', 'vision_model.encoder.layers.2.self_attn.q_proj.bias', 'vision_model.encoder.layers.8.mlp.fc1.bias', 'vision_model.encoder.layers.5.self_attn.q_proj.weight', 'vision_model.encoder.layers.9.layer_norm1.bias', 'vision_model.encoder.layers.3.self_attn.v_proj.bias', 'vision_model.encoder.layers.1.mlp.fc2.weight', 'vision_model.pre_layrnorm.bias', 'vision_model.encoder.layers.14.layer_norm2.weight', 'vision_model.encoder.layers.20.layer_norm1.weight', 'visual_projection.weight', 'vision_model.post_layernorm.weight', 'vision_model.encoder.layers.11.mlp.fc1.bias', 'vision_model.encoder.layers.11.self_attn.k_proj.weight', 'logit_scale', 'vision_model.encoder.layers.19.self_attn.k_proj.bias', 'vision_model.encoder.layers.1.self_attn.k_proj.bias', 'vision_model.encoder.layers.8.mlp.fc2.weight', 'vision_model.encoder.layers.0.layer_norm2.bias', 'vision_model.encoder.layers.12.layer_norm1.weight', 'vision_model.encoder.layers.3.self_attn.out_proj.bias', 'vision_model.encoder.layers.4.self_attn.out_proj.bias', 'vision_model.encoder.layers.6.self_attn.v_proj.weight', 'vision_model.encoder.layers.6.self_attn.out_proj.weight', 'vision_model.encoder.layers.16.self_attn.v_proj.bias', 'vision_model.encoder.layers.20.self_attn.out_proj.bias', 'vision_model.embeddings.position_embedding.weight', 'vision_model.encoder.layers.1.self_attn.q_proj.weight', 'vision_model.encoder.layers.21.layer_norm2.bias', 'vision_model.encoder.layers.9.self_attn.k_proj.bias', 'vision_model.encoder.layers.15.layer_norm2.bias', 'vision_model.encoder.layers.8.self_attn.k_proj.weight', 'vision_model.encoder.layers.22.self_attn.out_proj.bias', 'vision_model.encoder.layers.7.self_attn.q_proj.bias', 'vision_model.encoder.layers.23.mlp.fc1.bias', 'vision_model.encoder.layers.20.layer_norm1.bias', 'vision_model.encoder.layers.11.mlp.fc2.bias', 'vision_model.encoder.layers.23.self_attn.q_proj.weight', 'vision_model.encoder.layers.3.layer_norm2.weight', 'vision_model.encoder.layers.19.self_attn.v_proj.bias', 'vision_model.encoder.layers.15.self_attn.out_proj.bias', 'vision_model.encoder.layers.21.self_attn.out_proj.bias', 'vision_model.embeddings.patch_embedding.weight', 'vision_model.encoder.layers.15.self_attn.k_proj.bias', 'vision_model.encoder.layers.8.layer_norm2.bias', 'vision_model.encoder.layers.6.mlp.fc1.weight', 'vision_model.encoder.layers.10.mlp.fc1.bias', 'vision_model.encoder.layers.16.mlp.fc2.weight', 'vision_model.encoder.layers.18.self_attn.q_proj.bias', 'vision_model.encoder.layers.21.layer_norm1.bias', 'vision_model.encoder.layers.9.self_attn.out_proj.bias', 'vision_model.encoder.layers.16.self_attn.out_proj.weight', 'vision_model.encoder.layers.9.self_attn.q_proj.weight', 'vision_model.encoder.layers.7.layer_norm2.bias', 'vision_model.encoder.layers.7.mlp.fc1.weight', 'vision_model.encoder.layers.14.layer_norm2.bias', 'vision_model.encoder.layers.9.self_attn.q_proj.bias', 'vision_model.encoder.layers.21.mlp.fc2.weight', 'vision_model.encoder.layers.3.self_attn.k_proj.bias', 'vision_model.encoder.layers.16.self_attn.k_proj.weight', 'vision_model.encoder.layers.0.mlp.fc2.weight', 'vision_model.encoder.layers.9.self_attn.v_proj.weight', 'vision_model.encoder.layers.13.self_attn.v_proj.weight', 'vision_model.encoder.layers.17.mlp.fc2.bias', 'vision_model.encoder.layers.18.self_attn.k_proj.weight', 'vision_model.encoder.layers.13.mlp.fc2.weight', 'vision_model.encoder.layers.2.layer_norm2.weight', 'vision_model.encoder.layers.1.self_attn.v_proj.weight', 'vision_model.encoder.layers.12.self_attn.v_proj.bias', 'vision_model.encoder.layers.20.self_attn.k_proj.weight', 'vision_model.encoder.layers.2.mlp.fc2.bias', 'vision_model.encoder.layers.18.layer_norm1.weight', 'vision_model.encoder.layers.20.mlp.fc1.weight', 'vision_model.encoder.layers.3.mlp.fc1.weight', 'vision_model.encoder.layers.13.layer_norm2.weight', 'vision_model.encoder.layers.11.self_attn.out_proj.bias', 'vision_model.encoder.layers.2.mlp.fc1.weight', 'vision_model.encoder.layers.16.self_attn.k_proj.bias', 'vision_model.encoder.layers.15.mlp.fc1.bias', 'vision_model.encoder.layers.5.self_attn.out_proj.weight', 'vision_model.encoder.layers.1.mlp.fc2.bias', 'vision_model.encoder.layers.0.self_attn.k_proj.bias', 'vision_model.encoder.layers.7.self_attn.v_proj.weight', 'vision_model.encoder.layers.18.layer_norm1.bias', 'vision_model.encoder.layers.16.self_attn.out_proj.bias', 'vision_model.encoder.layers.5.layer_norm1.bias', 'text_projection.weight', 'vision_model.encoder.layers.7.layer_norm1.weight', 'vision_model.encoder.layers.23.self_attn.v_proj.weight', 'vision_model.encoder.layers.4.mlp.fc2.bias', 'vision_model.encoder.layers.12.mlp.fc1.bias', 'vision_model.encoder.layers.20.self_attn.v_proj.weight', 'vision_model.encoder.layers.8.layer_norm1.weight', 'vision_model.encoder.layers.22.self_attn.q_proj.bias', 'vision_model.encoder.layers.9.layer_norm2.bias', 'vision_model.encoder.layers.0.self_attn.v_proj.bias', 'vision_model.encoder.layers.2.self_attn.k_proj.weight', 'vision_model.encoder.layers.12.self_attn.q_proj.bias', 'vision_model.encoder.layers.1.layer_norm2.bias', 'vision_model.encoder.layers.22.self_attn.q_proj.weight', 'vision_model.encoder.layers.16.mlp.fc1.weight', 'vision_model.encoder.layers.21.self_attn.k_proj.bias', 'vision_model.encoder.layers.12.mlp.fc2.weight', 'vision_model.encoder.layers.9.mlp.fc2.weight', 'vision_model.encoder.layers.0.layer_norm1.bias', 'vision_model.encoder.layers.8.mlp.fc2.bias', 'vision_model.encoder.layers.13.self_attn.out_proj.weight', 'vision_model.encoder.layers.23.self_attn.k_proj.weight', 'vision_model.encoder.layers.15.self_attn.q_proj.bias', 'vision_model.encoder.layers.16.mlp.fc1.bias', 'vision_model.encoder.layers.0.mlp.fc2.bias', 'vision_model.encoder.layers.9.self_attn.out_proj.weight', 'vision_model.encoder.layers.19.layer_norm1.bias', 'vision_model.encoder.layers.19.mlp.fc2.bias', 'vision_model.encoder.layers.21.self_attn.k_proj.weight', 'vision_model.encoder.layers.16.layer_norm2.bias', 'vision_model.encoder.layers.12.layer_norm1.bias', 'vision_model.encoder.layers.5.self_attn.v_proj.weight', 'vision_model.encoder.layers.16.self_attn.q_proj.weight', 'vision_model.encoder.layers.17.mlp.fc1.weight', 'vision_model.encoder.layers.12.self_attn.k_proj.bias', 'vision_model.encoder.layers.23.mlp.fc1.weight', 'vision_model.encoder.layers.6.self_attn.out_proj.bias', 'vision_model.encoder.layers.0.self_attn.q_proj.weight', 'vision_model.encoder.layers.22.self_attn.v_proj.bias', 'vision_model.encoder.layers.9.mlp.fc2.bias', 'vision_model.encoder.layers.10.layer_norm2.bias', 'vision_model.encoder.layers.5.layer_norm1.weight', 'vision_model.encoder.layers.23.self_attn.k_proj.bias', 'vision_model.encoder.layers.20.self_attn.q_proj.weight', 'vision_model.encoder.layers.9.self_attn.k_proj.weight', 'vision_model.encoder.layers.7.self_attn.out_proj.bias', 'vision_model.encoder.layers.8.self_attn.v_proj.weight', 'vision_model.encoder.layers.8.self_attn.out_proj.weight', 'vision_model.encoder.layers.18.layer_norm2.weight', 'vision_model.encoder.layers.10.mlp.fc2.bias', 'vision_model.encoder.layers.17.self_attn.out_proj.bias', 'vision_model.encoder.layers.14.mlp.fc1.weight', 'vision_model.encoder.layers.3.mlp.fc2.weight', 'vision_model.encoder.layers.8.layer_norm1.bias', 'vision_model.encoder.layers.11.layer_norm1.bias', 'vision_model.encoder.layers.3.self_attn.q_proj.weight', 'vision_model.encoder.layers.4.layer_norm2.weight', 'vision_model.encoder.layers.18.self_attn.v_proj.bias', 'vision_model.encoder.layers.13.mlp.fc1.weight', 'vision_model.encoder.layers.15.mlp.fc2.weight', 'vision_model.encoder.layers.23.self_attn.q_proj.bias', 'vision_model.encoder.layers.6.layer_norm2.bias', 'vision_model.encoder.layers.4.mlp.fc1.bias', 'vision_model.encoder.layers.15.self_attn.v_proj.bias', 'vision_model.encoder.layers.1.self_attn.q_proj.bias', 'vision_model.encoder.layers.12.self_attn.v_proj.weight', 'vision_model.encoder.layers.4.layer_norm1.bias', 'vision_model.encoder.layers.21.self_attn.out_proj.weight', 'vision_model.encoder.layers.19.mlp.fc2.weight', 'vision_model.encoder.layers.6.self_attn.k_proj.weight', 'vision_model.encoder.layers.8.self_attn.k_proj.bias', 'vision_model.encoder.layers.17.self_attn.v_proj.bias', 'vision_model.encoder.layers.6.self_attn.k_proj.bias', 'vision_model.encoder.layers.6.self_attn.v_proj.bias', 'vision_model.encoder.layers.10.self_attn.k_proj.bias', 'vision_model.encoder.layers.23.layer_norm2.bias', 'vision_model.encoder.layers.3.self_attn.q_proj.bias', 'vision_model.encoder.layers.2.mlp.fc1.bias', 'vision_model.encoder.layers.21.layer_norm1.weight', 'vision_model.encoder.layers.2.layer_norm1.bias', 'vision_model.encoder.layers.13.self_attn.q_proj.weight', 'vision_model.encoder.layers.1.layer_norm1.bias', 'vision_model.encoder.layers.11.mlp.fc1.weight', 'vision_model.encoder.layers.5.self_attn.out_proj.bias', 'vision_model.encoder.layers.13.mlp.fc2.bias', 'vision_model.encoder.layers.1.layer_norm1.weight', 'vision_model.encoder.layers.11.self_attn.q_proj.weight', 'vision_model.encoder.layers.5.layer_norm2.weight', 'vision_model.encoder.layers.20.mlp.fc1.bias', 'vision_model.encoder.layers.23.layer_norm1.bias', 'vision_model.encoder.layers.14.mlp.fc2.weight', 'vision_model.encoder.layers.0.mlp.fc1.bias', 'vision_model.encoder.layers.19.self_attn.out_proj.bias', 'vision_model.encoder.layers.7.self_attn.out_proj.weight', 'vision_model.encoder.layers.22.layer_norm1.weight', 'vision_model.encoder.layers.14.mlp.fc1.bias', 'vision_model.encoder.layers.15.layer_norm1.bias', 'vision_model.encoder.layers.14.self_attn.k_proj.bias', 'vision_model.encoder.layers.14.layer_norm1.bias', 'vision_model.encoder.layers.3.self_attn.v_proj.weight', 'vision_model.encoder.layers.12.layer_norm2.bias', 'vision_model.encoder.layers.10.layer_norm1.weight', 'vision_model.encoder.layers.10.self_attn.out_proj.weight', 'vision_model.encoder.layers.4.self_attn.out_proj.weight', 'vision_model.encoder.layers.14.layer_norm1.weight', 'vision_model.encoder.layers.12.self_attn.k_proj.weight', 'vision_model.encoder.layers.1.self_attn.k_proj.weight', 'vision_model.encoder.layers.16.layer_norm1.weight', 'vision_model.encoder.layers.23.self_attn.out_proj.bias', 'vision_model.encoder.layers.1.self_attn.v_proj.bias', 'vision_model.encoder.layers.15.mlp.fc2.bias', 'vision_model.encoder.layers.14.mlp.fc2.bias', 'vision_model.encoder.layers.18.self_attn.out_proj.weight', 'vision_model.encoder.layers.13.layer_norm2.bias', 'vision_model.encoder.layers.9.layer_norm1.weight', 'vision_model.encoder.layers.12.layer_norm2.weight', 'vision_model.encoder.layers.17.self_attn.k_proj.weight', 'vision_model.encoder.layers.13.self_attn.k_proj.weight', 'vision_model.encoder.layers.10.self_attn.q_proj.bias', 'vision_model.encoder.layers.13.layer_norm1.bias', 'vision_model.encoder.layers.10.layer_norm1.bias', 'vision_model.encoder.layers.15.layer_norm2.weight', 'vision_model.encoder.layers.6.layer_norm1.weight', 'vision_model.encoder.layers.8.self_attn.out_proj.bias', 'vision_model.encoder.layers.5.mlp.fc2.bias', 'vision_model.encoder.layers.2.self_attn.v_proj.bias', 'vision_model.encoder.layers.21.mlp.fc2.bias', 'vision_model.encoder.layers.5.self_attn.k_proj.weight', 'vision_model.encoder.layers.17.layer_norm2.bias', 'vision_model.encoder.layers.3.layer_norm1.weight']
- This IS expected if you are initializing CLIPTextModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing CLIPTextModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Creating invisible watermark encoder (see https://github.com/ShieldMnt/invisible-watermark)...
Sampling:   0%|                                                                                   | 0/1 [00:00<?, ?it/sData shape for PLMS sampling is (1, 4, 64, 64)                                                     | 0/1 [00:00<?, ?it/s]
Running PLMS Sampling with 50 timesteps
PLMS Sampler:   0%|                                                                              | 0/50 [00:05<?, ?it/s]
data:   0%|                                                                                       | 0/1 [00:19<?, ?it/s]
Sampling:   0%|                                                                                   | 0/1 [00:19<?, ?it/s]
Traceback (most recent call last):
  File "/Users/redstar16/project/akaboshinit/clone/stable-diffusion/scripts/txt2img.py", line 355, in <module>
    main()
  File "/Users/redstar16/project/akaboshinit/clone/stable-diffusion/scripts/txt2img.py", line 306, in main
    samples_ddim, _ = sampler.sample(S=opt.ddim_steps,
  File "/Users/redstar16/.asdf/installs/python/miniforge3-4.10.1-5/envs/ldm/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
    return func(*args, **kwargs)
  File "/Users/redstar16/project/akaboshinit/clone/stable-diffusion/ldm/models/diffusion/plms.py", line 103, in sample
    samples, intermediates = self.plms_sampling(conditioning, size,
  File "/Users/redstar16/.asdf/installs/python/miniforge3-4.10.1-5/envs/ldm/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
    return func(*args, **kwargs)
  File "/Users/redstar16/project/akaboshinit/clone/stable-diffusion/ldm/models/diffusion/plms.py", line 158, in plms_sampling
    outs = self.p_sample_plms(img, cond, ts, index=index, use_original_steps=ddim_use_original_steps,
  File "/Users/redstar16/.asdf/installs/python/miniforge3-4.10.1-5/envs/ldm/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
    return func(*args, **kwargs)
  File "/Users/redstar16/project/akaboshinit/clone/stable-diffusion/ldm/models/diffusion/plms.py", line 224, in p_sample_plms
    e_t = get_model_output(x, t)
  File "/Users/redstar16/project/akaboshinit/clone/stable-diffusion/ldm/models/diffusion/plms.py", line 191, in get_model_output
    e_t_uncond, e_t = self.model.apply_model(x_in, t_in, c_in).chunk(2)
  File "/Users/redstar16/project/akaboshinit/clone/stable-diffusion/ldm/models/diffusion/ddpm.py", line 987, in apply_model
    x_recon = self.model(x_noisy, t, **cond)
  File "/Users/redstar16/.asdf/installs/python/miniforge3-4.10.1-5/envs/ldm/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1190, in _call_impl
    return forward_call(*input, **kwargs)
  File "/Users/redstar16/project/akaboshinit/clone/stable-diffusion/ldm/models/diffusion/ddpm.py", line 1410, in forward
    out = self.diffusion_model(x, t, context=cc)
  File "/Users/redstar16/.asdf/installs/python/miniforge3-4.10.1-5/envs/ldm/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1190, in _call_impl
    return forward_call(*input, **kwargs)
  File "/Users/redstar16/project/akaboshinit/clone/stable-diffusion/ldm/modules/diffusionmodules/openaimodel.py", line 732, in forward
    h = module(h, emb, context)
  File "/Users/redstar16/.asdf/installs/python/miniforge3-4.10.1-5/envs/ldm/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1190, in _call_impl
    return forward_call(*input, **kwargs)
  File "/Users/redstar16/project/akaboshinit/clone/stable-diffusion/ldm/modules/diffusionmodules/openaimodel.py", line 85, in forward
    x = layer(x, context)
  File "/Users/redstar16/.asdf/installs/python/miniforge3-4.10.1-5/envs/ldm/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1190, in _call_impl
    return forward_call(*input, **kwargs)
  File "/Users/redstar16/project/akaboshinit/clone/stable-diffusion/ldm/modules/attention.py", line 254, in forward
    x = self.norm(x)
  File "/Users/redstar16/.asdf/installs/python/miniforge3-4.10.1-5/envs/ldm/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1190, in _call_impl
    return forward_call(*input, **kwargs)
  File "/Users/redstar16/.asdf/installs/python/miniforge3-4.10.1-5/envs/ldm/lib/python3.10/site-packages/torch/nn/modules/normalization.py", line 273, in forward
    return F.group_norm(
  File "/Users/redstar16/.asdf/installs/python/miniforge3-4.10.1-5/envs/ldm/lib/python3.10/site-packages/torch/nn/functional.py", line 2524, in group_norm
    return torch.group_norm(input, num_groups, weight, bias, eps, torch.backends.cudnn.enabled)
RuntimeError: expected scalar type BFloat16 but found Float
junukwon7 commented 2 years ago

Check https://github.com/junukwon7/stable-diffusion out.

I've combined several suggestions and troubleshooting methods.

How to do it

When using Einanoa's repo

git clone https://github.com/einanao/stable-diffusion.git
cd stable-diffusion
git checkout apple-silicon

mkdir -p models/ldm/stable-diffusion-v1/
ln -s /path/to/ckpt/sd-v1-1.ckpt models/ldm/stable-diffusion-v1/model.ckpt

conda create -n ldm python=3.8
conda activate ldm

conda install pytorch torchvision torchaudio -c pytorch-nightly
pip install kornia albumentations opencv-python pudb imageio imageio-ffmpeg pytorch-lightning omegaconf test-tube streamlit einops torch-fidelity transformers
pip install -e git+https://github.com/CompVis/taming-transformers.git@master#egg=taming-transformers
pip install -e git+https://github.com/openai/CLIP.git@main#egg=clip
pip install -e .

When using Magnusviri's repo

git clone https://github.com/magnusviri/stable-diffusion
cd stable-diffusion
git checkout apple-silicon-mps-support

mkdir -p models/ldm/stable-diffusion-v1/
ln -s /path/to/ckpt/sd-v1-1.ckpt models/ldm/stable-diffusion-v1/model.ckpt

conda env create -f environment-mac.yaml
conda activate ldm

Then, as @einanao suggested, Append .contiguous() at ldm/models/diffusion/plms.py#L27 So it would look like

-        attr = attr.to(torch.float32).to(torch.device(self.device_available))
+        attr = attr.to(torch.float32).to(torch.device(self.device_available)).contiguous()

Append new line x = x.contiguous() after ldm/modules/attention.py#L211 So it would look like

def _forward(self, x, context=None):
+       x = x.contiguous()
        x = self.attn1(self.norm1(x)) + x

Or, you can just modify the torch library as @magnusviri suggested in his repo.

Finally

python scripts/txt2img.py --prompt "a photograph of an astronaut riding a horse" --plms --n_samples 1 --n_rows 1 --n_iter 1

Thanks all for finding ways to make standard-diffusion functional in Apple Silicon macs.

Troubleshootings

Could not build wheels for tokenizers

ERROR: Could not build wheels for tokenizers, which is required to install pyproject.toml-based projects

Install the Rust Compiler and the issue shall be resolved. Rust is required in order to build wheel for

curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh

ModuleNotFoundError: No module named 'imwatermark'

Just remove the watermark generating part in txt2img.py

MPSNDArray

[MPSNDArray / MPSTemporaryNDArray initWithDevice:descriptor:] Error: product of dimension sizes > 2**31'

Reduce width and height in order to lower memory consumption

RuntimeError

RuntimeError: view size is not compatible with input tensor's size and stride (at least one dimension spans across two contiguous subspaces). Use .reshape(...) instead.

Add .contiguous() as mentioned above

@saibakatar Seems like my system is identical with yours. (M1 max, 64g) Hope you could find a solution.

@akaboshinit As far as I know, macOS torch doesn't support BFloat16. Would you try installing the environments again?

If it still doesn't work, try both einanao's and magnusviri's.

cannin commented 2 years ago

Error

Error I encountered on a 14" M1 Max 64GB with https://github.com/magnusviri/stable-diffusion/tree/apple-silicon-mps-support

      error: can't find Rust compiler

      If you are using an outdated pip version, it is possible a prebuilt wheel is available for this package but pip is not able to install from it. Installing from the wheel would avoid the need for a Rust compiler.

      To update pip, run:

          pip install --upgrade pip

      and then retry package installation.

      If you did intend to build this package from source, try installing a Rust compiler from your system package manager and ensure it is on the PATH during installation. Alternatively, rustup (available at https://rustup.rs) is the recommended way to download and update the Rust compiler toolchain.
      [end of output]

  note: This error originates from a subprocess, and is likely not a problem with pip.
  ERROR: Failed building wheel for tokenizers
ERROR: Could not build wheels for tokenizers, which is required to install pyproject.toml-based projects

Solution to above and other issues

Solution to above and other issues to get it working:

  1. Switched to Mamba (https://github.com/mamba-org/mamba); something I read made me think it could resolve osx-arm64 dependencies better
  2. In environment-mac.yaml: add conda-forge and apple channels; transformers>=4.9; pip=22.2.2
  3. Do brew install rust; not sure which of 1,2,3 actually fixed the tokenizers problem
  4. Find pytorch installation location as mentioned by @cgodley then edit nn/functional.py (@filipux) with
    return torch.layer_norm(input.contiguous(), normalized_shape, weight, bias, eps, torch.backends.cudnn.enabled)

where previously the first return torch.layer_norm() parameter was input

  1. Grab weights file sd-v1-4.ckpt from https://huggingface.co/CompVis/stable-diffusion-v-1-4-original
  2. Tried out with
python scripts/txt2img.py --prompt "a photograph of an astronaut riding a horse" --ckpt 'sd-v1-4.ckpt' --H 512 --W 512 --seed 42 --n_iter 1 --ddim_steps 50

New environment-mac.yaml

name: ldm
channels:
  - apple
  - conda-forge
  - pytorch-nightly
  - defaults
dependencies:
  - python=3.10.4
  - pip=22.2.2
  - pytorch=1.13.0.dev20220824
  - torchvision
  - numpy=1.23.1
  - pip:
    - albumentations==0.4.6
    - diffusers
    - opencv-python==4.6.0.66
    - pudb==2019.2
    - imageio==2.9.0
    - imageio-ffmpeg==0.4.2
    - pytorch-lightning==1.4.2
    - omegaconf==2.1.1
    - test-tube>=0.7.5
    - streamlit>=0.73.1
    - einops==0.3.0
    - torch-fidelity==0.3.0
    - transformers>=4.9
    - torchmetrics==0.6.0
    - kornia==0.6
    - imwatermark==0.0.2
    - invisible-watermark==0.1.5
    - -e git+https://github.com/CompVis/taming-transformers.git@master#egg=taming-transformers
    - -e git+https://github.com/openai/CLIP.git@main#egg=clip
    - -e .
HenkPoley commented 2 years ago

@saibakatar https://github.com/CompVis/stable-diffusion/issues/25#issuecomment-1225849407

stable-diffusion %

That prompt doesn't look like you are inside a conda environment. It doesn't say (base) nor (ldm). Did you run conda activate ldm? (or something else than 'ldm' if you named it differently, conda env create -f environment-mac.yaml -n something_else)

That environment you created is set up to run torch with stable diffusion on the M1 GPU.

magnusviri commented 2 years ago

@Birch-san @marcfon @junukwon7 @akaboshinit Doh! I forgot to include the invisible-watermark requirement in the environment-mac.yaml file. I've fixed it now. I don't know if imwatermark is required. Sorry I didn't test it yesterday. Except I still haven't tested it so I must not be too sorry... If you get around to reinstalling let me know if it works.

hdevalence commented 2 years ago

Following the instructions on an M1 Max MBP running macOS 12.5.1 gave an error about duplicate OpenMP versions:

OMP: Error #15: Initializing libiomp5.dylib, but found libomp.dylib already initialized.
OMP: Hint This means that multiple copies of the OpenMP runtime have been linked into the program. That is dangerous, since it can degrade performance or cause incorrect results. The best thing to do is to ensure that only a single OpenMP runtime is linked into the process, e.g. by avoiding static linking of the OpenMP runtime in any library. As an unsafe, unsupported, undocumented workaround you can set the environment variable KMP_DUPLICATE_LIB_OK=TRUE to allow the program to continue to execute, but that may cause crashes or silently produce incorrect results. For more information, please see http://www.intel.com/software/products/support/.
videah commented 2 years ago

Following the instructions on an M1 Max MBP running macOS 12.5.1 gave an error about duplicate OpenMP versions:

OMP: Error #15: Initializing libiomp5.dylib, but found libomp.dylib already initialized.
OMP: Hint This means that multiple copies of the OpenMP runtime have been linked into the program. That is dangerous, since it can degrade performance or cause incorrect results. The best thing to do is to ensure that only a single OpenMP runtime is linked into the process, e.g. by avoiding static linking of the OpenMP runtime in any library. As an unsafe, unsupported, undocumented workaround you can set the environment variable KMP_DUPLICATE_LIB_OK=TRUE to allow the program to continue to execute, but that may cause crashes or silently produce incorrect results. For more information, please see http://www.intel.com/software/products/support/.

I fixed this issue by switching to the Apple Silicon version of miniforge instead of Anaconda and using this as my environment-mac.yaml

index 22fd28b..aaf63e9 100644
--- a/environment-mac.yaml
+++ b/environment-mac.yaml
@@ -1,11 +1,12 @@
 name: ldm
 channels:
+  - conda-forge
   - pytorch-nightly
   - defaults
 dependencies:
   - python=3.10.4
   - pip=22.1.2
-  - pytorch
+  - pytorch=1.13.0.dev20220824
   - torchvision
   - numpy=1.23.1
   - pip:
junukwon7 commented 2 years ago

@magnusviri

Thanks for the reply!

I'm afraid to tell you that the installation still doesn't work. Installing invisible-watermark triggers error since one of the dependencies, onnx has some issues while building wheel. (because on python 3.10, mac or cmake? still not clear)

The solution is: just ridding the watermark codes out. As ComVis added invisible-watermark just a two days ago, rolling the code back would have absolutely no problem.

Also, while reading through your description, I thought editing the torch library seemed somehow dangerous. As @einanao suggested, just adding .contiguous() to the model files would also resolve the issue.

Thanks again for your great work!

hdevalence commented 2 years ago

I fixed this issue by switching to the Apple Silicon version of miniforge instead of Anaconda and using this as my environment-mac.yaml

This also worked for me, but: only after also switching to the Apple Silicon version of miniforge. Just making those changes to the YAML file resulted in these MKL errors (posting in case someone else runs into this):

Intel MKL FATAL ERROR: This system does not meet the minimum requirements for use of the Intel(R) Math Kernel Library.
The processor must support the Intel(R) Supplemental Streaming SIMD Extensions 3 (Intel(R) SSSE3) instructions.
The processor must support the Intel(R) Streaming SIMD Extensions 4.2 (Intel(R) SSE4.2) instructions.
The processor must support the Intel(R) Advanced Vector Extensions (Intel(R) AVX) instructions.

I followed these instructions to install miniforge.

furmanlukasz commented 2 years ago

I just check out last 24 hours solutions and it looks like there is a must to switch to miniforge, no way to make it work by anaconda. On anaconda I reproduce all listed errors.

DaveMariner commented 2 years ago

I just check out last 24 hours solutions and it looks like there is a must to switch to miniforge, no way to make it work by anaconda. On anaconda I reproduce all listed errors.

Yeah - I'm still hitting the float64 issue running your sample code in that fork....

But I did see float32 mentioned in the thread. Maybe that will help?

https://github.com/CompVis/stable-diffusion/issues/25#issuecomment-1221416526

Edit: for reference, we came here from this thread: https://github.com/pytorch/pytorch/issues/77764#issuecomment-1227121050

magnusviri commented 2 years ago

I have it working with Anaconda. I don't know what I've done different. When I have time (it takes an hour just to read all the comments!) I'll try to figure something out. I'm trying to make my repo usable for all Mac users (until it hopefully gets merged).

Also, I think I'm having this issue: https://github.com/CompVis/stable-diffusion/issues/69

My images are black though. I also get crashes.

/Users/james/.conda/envs/ldm/lib/python3.10/multiprocessing/resource_tracker.py:224: UserWarning: resource_tracker: There appear to be 1 leaked semaphore objects to clean up at shutdown

I'm running this on 2020 MacBook Air M1 w/ 16GB RAM and 8-core GPU, and 2020 Mac Mini M1 w/ 16GB RAM and 8-core GPU.

marcfon commented 2 years ago

I have it working with Anaconda. I don't know what I've done different. When I have time (it takes an hour just to read all the comments!) I'll try to figure something out. I'm trying to make my repo usable for all Mac users (until it hopefully gets merged).

I can second that it does work with Anaconda on a MacBook M1 pro.

cristobal-est commented 2 years ago

My two cents is that I followed @cannin steps with two additions:

1- I had to update macOS Monterrey to the last version, since the pytorch MDS implementation only works in 12.3+. I speculate that this solved the RuntimeError: expected scalar type BFloat16 but found Float error, since I think in one of the repos it was silently defaulting to CPU and ending up in this error.

2- Also, I installed onnx from Anaconda before installing the pip dependencies, which I think also solved some dependency errors.

alexeydemidovkz commented 2 years ago

Maybe it will help anyone: Got it running on 16 m1 Max 32gb (32 cores) Takes around 30 seconds to generate an image (first cosmonaut prompt gave me NSFW warning:) GPU spikes to 95 percent, no fans. sd-v1-4.ckpt model file. Einanoa version gave me Cuda errors, Magnusviri worked. In Magnusviri version there was pip install error of onnx package which i fixed via:

brew install Cmake
brew install protobuf 

Thanks everyone for all of your work this is pretty amazing!

pjexposito commented 2 years ago

In my computer (MB Pro M1 with 8Gb RAM) the process took around 25 minutes to generate an image. What's wrong? Am I using CPU instead of GPU? Thanks!

Any-Winter-4079 commented 2 years ago

Does anyone have img2img working using an M1/M2?

I got txt2img to work on both CPU and GPU thanks to your repos (I created a mini guide here: https://www.reddit.com/r/StableDiffusion/comments/wx0tkn/stablediffusion_runs_on_m1_chips/).

img2img gives me Error: product of dimension sizes > 2**31'` with 64 GB of RAM.

Did anyone get it to work?

pnodseth commented 2 years ago

Does anyone have img2img working using an M1/M2?

I got txt2img to work on both CPU and GPU thanks to your repos (I created a mini guide here: https://www.reddit.com/r/StableDiffusion/comments/wx0tkn/stablediffusion_runs_on_m1_chips/).

img2img gives me Error: product of dimension sizes > 2**31'` with 64 GB of RAM.

Did anyone get it to work?

I got this exact error on img2img as well.

Any-Winter-4079 commented 2 years ago

Does anyone have img2img working using an M1/M2? I got txt2img to work on both CPU and GPU thanks to your repos (I created a mini guide here: https://www.reddit.com/r/StableDiffusion/comments/wx0tkn/stablediffusion_runs_on_m1_chips/). img2img gives me Error: product of dimension sizes > 2**31'` with 64 GB of RAM. Did anyone get it to work?

I got this exact error on img2img as well.

I got it to work with 256x256 image. Thanks @junukwon7 for the suggestion. Still doesn't work with 512x512 though.

alexeydemidovkz commented 2 years ago

I have a very weird thing going and I am a bit worried about my machine, I noticed there is some sound (like a hdd sound in some sense) going from my laptop when the txtToimg progress bar is going, The sound is in sync with progress bar going in terminal, starts with progress bar and ends when it finish, anyone with 16 m1 max or other machines can relate?:) I know this sounds weird but works everytime.