Open crsrusl opened 2 years ago
n_iter = sequential generations (will increase generation time but will not increase memory usage). n_samples = parallel generations, batch execution, (will increase generation time and memory usage).
Using the procedure above, it works fantastically well in my macbook. Every batch spent 28sec. (M1 Max, 64G)
However, if I expand the size to 1024 by 1024
, it returns MPSNDArray error.
my result is:
AppleInternal/Library/BuildRoots/20d6c351-ee94-11ec-bcaf-7247572f23b4/Library/Caches/com.apple.xbs/Sources/MetalPerformanceShaders/MPSCore/Types/MPSNDArray.mm:705: failed assertion '[MPSNDArray initWithDevice:descriptor:] Error: product of dimension sizes > 2**31' | 0/50 [00:00<?, ?it/s]
which is near identical to @filipux's result - the message is slightly different.
Have anyone managed to run
img2img
? I just did quick test with a 512x512 input image but there is an error message:
AppleInternal/Library/BuildRoots/20d6c351-ee94-11ec-bcaf-7247572f23b4/Library/Caches/com.apple.xbs/Sources/MetalPerformanceShaders/MPSCore/Types/MPSNDArray.mm:705: failed assertion '[MPSTemporaryNDArray initWithDevice:descriptor:] Error: product of dimension sizes > 2**31'
This error also shows when I try to change the width or height (--W/--H) in regular txt2img.
To make things clear, this problem occurs then I try to make the resolution higher than some limit. When using txt2img, the error occurs somewhere between 768 by 768
and 1024 by 1024
, since 768 worked and 1024 failed. (which is pretty obvious since the error states that the product exceeded INT_MAX
, so bigger resolution leads to error.)
So, the question is,
@junukwon7
I still can't understand why these problem occur in gpu calculations - since they were meant to handle large-scale parallel operations?
This is bleeding-edge stuff, the underlying Python libraries aren't yet fully optimised for Apple Silicon and still contain some bugs and incompatibilities. These will get ironed out very soon, I'm sure.
Some recommendations for up-scaling pictures (ex, 512p to 1024p) in apple silicon environments?
@cgodley Re: https://github.com/CompVis/stable-diffusion/issues/25#issuecomment-1224040740
I had to use conda update --force-reinstall -y -n base -c defaults conda
(with --force-reinstall
).
To get past:
RemoveError: 'cytoolz' is a dependency of conda and cannot be removed from
conda's operating environment.
RemoveError: 'setuptools' is a dependency of conda and cannot be removed from
conda's operating environment.
Good to know for people who are new to conda, how to remove an environment when conda env create fails: conda env remove -n ldm
. Optionally, with conda env create
you can use the -n
option to create an environment with another name.
You also forgot to mention where to load the conda environment.
E.g. where conda activate ldm
and conda deactivate
. I suppose directly after creation.
You can load pytorch-nightly
directly inside environment-mac.yaml
under - channels:
I believe.
You could pin these package versions I guess:
The path to functional.py is ~/miniconda3/envs/ldm/lib/python3.10/site-packages/torch/nn/functional.py
E.g.:
patch environment-mac.yaml < environment-mac.yaml.diff
patch ~/miniconda3/envs/ldm/lib/python3.10/site-packages/torch/nn/functional.py < funtional.py.diff
You need an mkdir -p models/ldm/stable-diffusion-v1/
before the (sym)linking of the model.ckpt. Btw, I had issues with #47, when using a symlink, so I had to hardlink (drop the -s
). Same with your description
Anyways, thanks, it appears this works. Or at least the 'PLMS sampler' progresses instead of errors.
Edit: just over 8 minutes. Now, why does it have twice the same/similar image in the output?
If you want to use a random seed: --seed "$(shuf -i 1-4294967295 -n1)"
My environment-mac.yaml.diff
:
diff --git a/environment-mac.yaml b/environment-mac.yaml
index d923d56..a664a6a 100644
--- a/environment-mac.yaml
+++ b/environment-mac.yaml
@@ -3,14 +3,14 @@ channels:
- pytorch
- defaults
dependencies:
- - python=3.8.5
- - pip=20.3
+ - python=3.10.4
+ - pip=22.1.2
- pytorch=1.12.1
- torchvision=0.13.1
- - numpy=1.19.2
+ - numpy=1.23.1
- pip:
- - albumentations==0.4.3
- - opencv-python==4.1.2.30
+ - albumentations==0.4.6
+ - opencv-python==4.6.0.66
- pudb==2019.2
- imageio==2.9.0
- imageio-ffmpeg==0.4.2
@@ -20,7 +20,7 @@ dependencies:
- streamlit>=0.73.1
- einops==0.3.0
- torch-fidelity==0.3.0
- - transformers==4.19.2
+ - transformers==4.21.2
- torchmetrics==0.6.0
- kornia==0.6
- -e git+https://github.com/CompVis/taming-transformers.git@master#egg=taming-transformers
@cgodley I got it working, 8 minutes runtime, regular M1 8 core GPU, plms scheduler. But with lots of swapping, I think.
(Also with n_samples=1, 6 minutes and few seconds, when not swapping so much. I saw it use 13.42 GB RAM briefly. It creates two samples in the outputs/txt2img-samples/samples/ directory. With the discharge rate of 21-22.6 I should be able to get about 22x2 images per charge 🙊)
Maybe you could integrate the things I mention in your comment?
I've updated my repo with instructions and the latest commits, which mainly adds diffusers, the invisible watermark, and the nsfw filter (which is trivial to disable, if you don't know how do a websearch). I've also added a patch file and updated the environments-mac.yml file with the pytorch-nightly channel and updated versions (as shown above).
I also have a question. Can anyone get seeds to work? When I set the seed I always get a new image. I'm wondering if it's because of Apple Silicon. That's why I'm asking here. If you have this working, can you test this by adding --seed 12345678
and generate something twice to see if it's the same?
The result appears to be different each time.
You could try torch.use_deterministic_algorithms(True)
, but maybe Apple Silicon isn't supported (yet). https://pytorch.org/docs/stable/notes/randomness.html
Edit: it doesn't fix it
"Completely reproducible results are not guaranteed across PyTorch releases, individual commits, or different platforms. Furthermore, results may not be reproducible between CPU and GPU executions, even when using identical seeds."
I bet it's MPS. There is an open issue in pytorch regarding gradient inconsistency. I am guessing that's what is causing this.
brilliant! I confirm that I was able to use @magnusviri's apple-silicon-mps-support
branch successfully with pytorch nightly 1.13.0.dev20220823
.
like everybody else has said: we have to modify torch.nn.functional.py::layer_norm
to move the input tensor into contiguous memory, as described above (https://github.com/CompVis/stable-diffusion/issues/25#issuecomment-1221416526) otherwise https://github.com/pytorch/pytorch/issues/80800 happens.
I took measurements (PLMS sampler doing 50 timesteps):
a batch-of-1 image can be generated in 39.5 secs.
a batch-of-3 images can be generated in 110 secs.
Anyone had any luck solving this issue?
AppleInternal/Library/BuildRoots/20d6c351-ee94-11ec-bcaf-7247572f23b4/Library/Caches/com.apple.xbs/Sources/MetalPerformanceShaders/MPSCore/Types/MPSNDArray.mm:705: failed assertion '[MPSTemporaryNDArray initWithDevice:descriptor:] Error: product of dimension sizes > 2**31'
awesome to see mac silicon able to do this - cuda cock blocking by apple has been such a bane to ai. Would love to see m2 outperform my 3090. the mac pro is going to have > 128 gb of combined ram - so you'd think it would trump this 24gb nvidia card. I know nvidia are paddling hard to come up with an m1 equivovalent. this is what is spat out by my card - running python scripts/txt2img.py --prompt "central park album cover"
sampling is (3, 4, 64, 64), eta 0.0 | 0/1 [00:00<?, ?it/s]
Running DDIM Sampling with 50 timesteps
DDIM Sampler: 100%|█████████████████████████████████████████████████████████████████████| 50/50 [00:12<00:00, 4.13it/s]
data: 100%|███████████████████████████████████████████████████████████████████████████████| 1/1 [00:16<00:00, 16.14s/it]
Sampling: 50%|█████████████████████████████████████▌ | 1/2 [00:16<00:16, 16.14s/itData shape for DDIM sampling is (3, 4, 64, 64), eta 0.0 | 0/1 [00:00<?, ?it/s]
Running DDIM Sampling with 50 timesteps
DDIM Sampler: 100%|█████████████████████████████████████████████████████████████████████| 50/50 [00:11<00:00, 4.17it/s]
data: 100%|███████████████████████████████████████████████████████████████████████████████| 1/1 [00:15<00:00, 15.10s/it]
Sampling: 100%|███████████████████████████████████████████████████████████████████████████| 2/2 [00:31<00:00, 15.62s/it]
Can someone tell me the solution...
env:
% python -V
Python 3.10.4
% conda -V
conda 4.14.0
error:
% python scripts/txt2img.py --prompt "a photograph of an astronaut riding a horse" --plms
Global seed set to 42
Loading model from models/ldm/stable-diffusion-v1/model.ckpt
Traceback (most recent call last):
File "/Users/redstar16/project/akaboshinit/clone/stable-diffusion/scripts/txt2img.py", line 355, in <module>
main()
File "/Users/redstar16/project/akaboshinit/clone/stable-diffusion/scripts/txt2img.py", line 249, in main
model = load_model_from_config(config, f"{opt.ckpt}")
File "/Users/redstar16/project/akaboshinit/clone/stable-diffusion/scripts/txt2img.py", line 59, in load_model_from_config
pl_sd = torch.load(ckpt, map_location="cpu")
File "/Users/redstar16/.asdf/installs/python/miniforge3-4.10.1-5/envs/ldm/lib/python3.10/site-packages/torch/serialization.py", line 735, in load
return _legacy_load(opened_file, map_location, pickle_module, **pickle_load_args)
File "/Users/redstar16/.asdf/installs/python/miniforge3-4.10.1-5/envs/ldm/lib/python3.10/site-packages/torch/serialization.py", line 942, in _legacy_load
magic_number = pickle_module.load(f, **pickle_load_args)
_pickle.UnpicklingError: invalid load key, 'v'.
Loading model from models/ldm/stable-diffusion-v1/model.ckpt ... _pickle.UnpicklingError: invalid load key, 'v'.
@akaboshinit Did you remember to do git lfs checkout
in the stable-diffusion-v-1-4-original
project? If you did not do git lfs checkout
or git lfs pull
then the file contents of model.ckpt
may be a git lfs metadata placeholder instead of the real data.
@cgodley I didn't check enough... thanks! I'm ready to take the next step!
But I'm getting the error again. Please help.
% python scripts/txt2img.py --prompt "a photograph of an astronaut riding a horse" --plms
Global seed set to 42
Loading model from models/ldm/stable-diffusion-v1/model.ckpt
Global Step: 470000
LatentDiffusion: Running in eps-prediction mode
DiffusionWrapper has 859.52 M params.
making attention of type 'vanilla' with 512 in_channels
Working with z of shape (1, 4, 32, 32) = 4096 dimensions.
making attention of type 'vanilla' with 512 in_channels
Some weights of the model checkpoint at openai/clip-vit-large-patch14 were not used when initializing CLIPTextModel: ['vision_model.encoder.layers.7.self_attn.out_proj.bias', 'vision_model.encoder.layers.7.self_attn.q_proj.weight', 'vision_model.encoder.layers.9.self_attn.v_proj.weight', 'vision_model.encoder.layers.15.self_attn.out_proj.bias', 'vision_model.encoder.layers.23.layer_norm1.weight', 'vision_model.encoder.layers.1.mlp.fc1.bias', 'vision_model.encoder.layers.12.self_attn.v_proj.bias', 'vision_model.encoder.layers.7.layer_norm2.weight', 'vision_model.encoder.layers.2.mlp.fc2.weight', 'vision_model.encoder.layers.20.self_attn.v_proj.weight', 'vision_model.encoder.layers.4.self_attn.out_proj.bias', 'vision_model.encoder.layers.9.mlp.fc2.weight', 'vision_model.encoder.layers.17.self_attn.q_proj.weight', 'vision_model.encoder.layers.12.self_attn.out_proj.bias', 'vision_model.encoder.layers.2.mlp.fc1.weight', 'vision_model.encoder.layers.10.mlp.fc1.weight', 'vision_model.encoder.layers.4.mlp.fc1.bias', 'vision_model.encoder.layers.23.mlp.fc1.weight', 'vision_model.encoder.layers.12.layer_norm1.weight', 'vision_model.encoder.layers.1.self_attn.v_proj.weight', 'vision_model.encoder.layers.17.mlp.fc2.weight', 'vision_model.encoder.layers.23.mlp.fc2.weight', 'vision_model.encoder.layers.14.self_attn.out_proj.weight', 'vision_model.encoder.layers.10.layer_norm1.weight', 'vision_model.encoder.layers.4.self_attn.k_proj.weight', 'vision_model.encoder.layers.11.self_attn.out_proj.weight', 'vision_model.encoder.layers.3.self_attn.q_proj.weight', 'vision_model.encoder.layers.9.mlp.fc1.bias', 'vision_model.encoder.layers.12.mlp.fc1.weight', 'vision_model.encoder.layers.21.self_attn.q_proj.weight', 'vision_model.encoder.layers.10.mlp.fc2.bias', 'vision_model.encoder.layers.10.self_attn.v_proj.weight', 'vision_model.encoder.layers.15.self_attn.v_proj.bias', 'vision_model.encoder.layers.8.self_attn.v_proj.weight', 'vision_model.encoder.layers.12.mlp.fc1.bias', 'vision_model.encoder.layers.2.mlp.fc2.bias', 'vision_model.encoder.layers.14.mlp.fc2.bias', 'vision_model.encoder.layers.1.self_attn.q_proj.bias', 'vision_model.encoder.layers.18.self_attn.k_proj.weight', 'vision_model.encoder.layers.20.self_attn.k_proj.bias', 'vision_model.encoder.layers.10.self_attn.out_proj.weight', 'vision_model.encoder.layers.3.layer_norm2.weight', 'vision_model.encoder.layers.4.layer_norm2.bias', 'vision_model.encoder.layers.22.self_attn.out_proj.bias', 'vision_model.encoder.layers.23.self_attn.k_proj.bias', 'vision_model.encoder.layers.18.self_attn.q_proj.weight', 'vision_model.encoder.layers.5.layer_norm1.bias', 'vision_model.encoder.layers.3.mlp.fc2.weight', 'vision_model.encoder.layers.18.self_attn.v_proj.bias', 'vision_model.encoder.layers.7.self_attn.k_proj.weight', 'vision_model.encoder.layers.8.layer_norm2.bias', 'vision_model.encoder.layers.21.mlp.fc1.weight', 'vision_model.encoder.layers.18.mlp.fc2.bias', 'vision_model.encoder.layers.5.self_attn.k_proj.weight', 'vision_model.encoder.layers.17.self_attn.q_proj.bias', 'vision_model.encoder.layers.6.layer_norm2.bias', 'vision_model.encoder.layers.22.self_attn.q_proj.weight', 'vision_model.encoder.layers.12.self_attn.v_proj.weight', 'vision_model.encoder.layers.21.layer_norm2.bias', 'vision_model.encoder.layers.22.layer_norm1.weight', 'vision_model.encoder.layers.22.layer_norm1.bias', 'vision_model.encoder.layers.12.mlp.fc2.bias', 'vision_model.encoder.layers.16.self_attn.k_proj.weight', 'vision_model.encoder.layers.7.self_attn.v_proj.weight', 'vision_model.encoder.layers.13.self_attn.v_proj.weight', 'vision_model.encoder.layers.22.layer_norm2.weight', 'vision_model.encoder.layers.8.mlp.fc2.bias', 'vision_model.encoder.layers.18.self_attn.q_proj.bias', 'vision_model.encoder.layers.19.self_attn.k_proj.bias', 'vision_model.encoder.layers.17.layer_norm1.weight', 'vision_model.encoder.layers.18.layer_norm1.weight', 'vision_model.encoder.layers.15.layer_norm2.weight', 'vision_model.encoder.layers.14.layer_norm2.weight', 'vision_model.encoder.layers.11.mlp.fc1.bias', 'vision_model.encoder.layers.0.self_attn.k_proj.bias', 'vision_model.encoder.layers.11.self_attn.q_proj.bias', 'vision_model.encoder.layers.5.self_attn.v_proj.bias', 'vision_model.encoder.layers.23.self_attn.out_proj.bias', 'vision_model.encoder.layers.2.self_attn.v_proj.bias', 'vision_model.encoder.layers.1.self_attn.k_proj.bias', 'vision_model.encoder.layers.4.self_attn.q_proj.weight', 'vision_model.encoder.layers.17.self_attn.out_proj.weight', 'vision_model.encoder.layers.19.layer_norm1.bias', 'vision_model.encoder.layers.15.mlp.fc2.bias', 'vision_model.encoder.layers.19.mlp.fc2.weight', 'vision_model.encoder.layers.23.self_attn.k_proj.weight', 'vision_model.encoder.layers.5.self_attn.out_proj.bias', 'vision_model.encoder.layers.6.self_attn.q_proj.weight', 'vision_model.encoder.layers.14.mlp.fc1.weight', 'vision_model.encoder.layers.0.self_attn.v_proj.bias', 'vision_model.encoder.layers.12.layer_norm2.bias', 'vision_model.encoder.layers.4.layer_norm2.weight', 'vision_model.encoder.layers.19.self_attn.out_proj.weight', 'vision_model.encoder.layers.17.layer_norm2.bias', 'vision_model.encoder.layers.13.self_attn.out_proj.bias', 'vision_model.encoder.layers.3.self_attn.k_proj.bias', 'vision_model.encoder.layers.19.self_attn.v_proj.weight', 'vision_model.encoder.layers.6.layer_norm1.bias', 'vision_model.encoder.layers.3.self_attn.v_proj.bias', 'vision_model.encoder.layers.20.layer_norm1.weight', 'vision_model.encoder.layers.18.layer_norm1.bias', 'vision_model.encoder.layers.4.layer_norm1.bias', 'vision_model.encoder.layers.2.self_attn.out_proj.weight', 'vision_model.encoder.layers.4.self_attn.k_proj.bias', 'vision_model.encoder.layers.2.self_attn.v_proj.weight', 'vision_model.encoder.layers.18.self_attn.v_proj.weight', 'vision_model.encoder.layers.13.self_attn.v_proj.bias', 'vision_model.encoder.layers.3.self_attn.v_proj.weight', 'vision_model.encoder.layers.16.self_attn.out_proj.bias', 'vision_model.encoder.layers.22.self_attn.v_proj.bias', 'vision_model.encoder.layers.16.layer_norm2.bias', 'vision_model.encoder.layers.10.mlp.fc1.bias', 'vision_model.encoder.layers.2.layer_norm2.bias', 'vision_model.encoder.layers.15.mlp.fc1.bias', 'vision_model.encoder.layers.16.layer_norm1.bias', 'vision_model.encoder.layers.12.self_attn.k_proj.bias', 'vision_model.encoder.layers.7.layer_norm2.bias', 'vision_model.encoder.layers.1.self_attn.k_proj.weight', 'vision_model.encoder.layers.18.layer_norm2.weight', 'vision_model.encoder.layers.7.layer_norm1.weight', 'vision_model.encoder.layers.17.layer_norm1.bias', 'vision_model.encoder.layers.19.layer_norm1.weight', 'vision_model.encoder.layers.16.self_attn.out_proj.weight', 'vision_model.encoder.layers.21.self_attn.out_proj.bias', 'vision_model.encoder.layers.2.mlp.fc1.bias', 'vision_model.encoder.layers.13.mlp.fc2.weight', 'vision_model.encoder.layers.15.layer_norm1.bias', 'vision_model.encoder.layers.22.self_attn.out_proj.weight', 'vision_model.encoder.layers.11.self_attn.k_proj.weight', 'vision_model.encoder.layers.0.mlp.fc1.bias', 'vision_model.encoder.layers.23.layer_norm1.bias', 'vision_model.encoder.layers.17.self_attn.k_proj.bias', 'vision_model.encoder.layers.9.layer_norm1.bias', 'vision_model.encoder.layers.6.layer_norm1.weight', 'vision_model.encoder.layers.20.layer_norm1.bias', 'vision_model.encoder.layers.0.self_attn.out_proj.weight', 'vision_model.encoder.layers.3.mlp.fc1.weight', 'vision_model.encoder.layers.11.layer_norm2.weight', 'vision_model.encoder.layers.2.self_attn.k_proj.weight', 'vision_model.encoder.layers.20.mlp.fc1.bias', 'vision_model.encoder.layers.5.mlp.fc2.bias', 'vision_model.encoder.layers.10.self_attn.k_proj.bias', 'vision_model.encoder.layers.12.layer_norm1.bias', 'vision_model.encoder.layers.21.self_attn.v_proj.weight', 'vision_model.encoder.layers.18.layer_norm2.bias', 'vision_model.encoder.layers.13.self_attn.q_proj.weight', 'vision_model.encoder.layers.3.layer_norm2.bias', 'vision_model.encoder.layers.5.mlp.fc2.weight', 'vision_model.encoder.layers.5.self_attn.q_proj.bias', 'vision_model.encoder.layers.9.self_attn.q_proj.weight', 'vision_model.encoder.layers.19.self_attn.v_proj.bias', 'vision_model.encoder.layers.14.self_attn.q_proj.weight', 'vision_model.encoder.layers.2.layer_norm1.weight', 'vision_model.encoder.layers.20.self_attn.q_proj.bias', 'vision_model.encoder.layers.21.mlp.fc1.bias', 'vision_model.encoder.layers.2.layer_norm1.bias', 'vision_model.encoder.layers.6.mlp.fc2.weight', 'vision_model.encoder.layers.9.self_attn.out_proj.bias', 'vision_model.encoder.layers.16.mlp.fc2.weight', 'vision_model.encoder.layers.16.self_attn.q_proj.weight', 'vision_model.encoder.layers.21.self_attn.k_proj.bias', 'vision_model.encoder.layers.3.self_attn.q_proj.bias', 'vision_model.encoder.layers.8.self_attn.v_proj.bias', 'vision_model.encoder.layers.21.layer_norm2.weight', 'vision_model.encoder.layers.20.self_attn.out_proj.weight', 'vision_model.encoder.layers.2.layer_norm2.weight', 'vision_model.encoder.layers.21.self_attn.out_proj.weight', 'vision_model.encoder.layers.13.self_attn.k_proj.weight', 'vision_model.encoder.layers.15.self_attn.v_proj.weight', 'vision_model.encoder.layers.19.self_attn.q_proj.weight', 'vision_model.encoder.layers.23.self_attn.v_proj.bias', 'vision_model.encoder.layers.14.self_attn.k_proj.weight', 'vision_model.encoder.layers.17.mlp.fc2.bias', 'vision_model.encoder.layers.22.self_attn.q_proj.bias', 'vision_model.encoder.layers.8.mlp.fc1.bias', 'vision_model.encoder.layers.17.self_attn.v_proj.bias', 'vision_model.encoder.layers.22.layer_norm2.bias', 'vision_model.encoder.layers.22.mlp.fc2.bias', 'vision_model.encoder.layers.18.mlp.fc2.weight', 'vision_model.encoder.layers.11.self_attn.k_proj.bias', 'vision_model.encoder.layers.0.self_attn.out_proj.bias', 'vision_model.embeddings.class_embedding', 'vision_model.encoder.layers.9.self_attn.v_proj.bias', 'vision_model.encoder.layers.10.layer_norm1.bias', 'vision_model.encoder.layers.11.mlp.fc2.weight', 'vision_model.encoder.layers.1.self_attn.v_proj.bias', 'vision_model.encoder.layers.3.mlp.fc1.bias', 'vision_model.encoder.layers.15.self_attn.out_proj.weight', 'vision_model.encoder.layers.1.layer_norm2.weight', 'vision_model.encoder.layers.12.mlp.fc2.weight', 'vision_model.encoder.layers.15.layer_norm2.bias', 'vision_model.encoder.layers.10.self_attn.q_proj.weight', 'vision_model.encoder.layers.11.self_attn.v_proj.weight', 'vision_model.encoder.layers.23.self_attn.v_proj.weight', 'vision_model.encoder.layers.5.self_attn.v_proj.weight', 'vision_model.encoder.layers.9.layer_norm2.bias', 'vision_model.encoder.layers.1.mlp.fc2.weight', 'vision_model.encoder.layers.5.layer_norm1.weight', 'vision_model.encoder.layers.21.self_attn.k_proj.weight', 'vision_model.encoder.layers.17.self_attn.v_proj.weight', 'vision_model.encoder.layers.10.layer_norm2.weight', 'vision_model.encoder.layers.23.layer_norm2.bias', 'vision_model.encoder.layers.7.self_attn.k_proj.bias', 'vision_model.encoder.layers.13.mlp.fc1.weight', 'vision_model.encoder.layers.0.mlp.fc1.weight', 'vision_model.encoder.layers.16.self_attn.v_proj.weight', 'vision_model.encoder.layers.20.self_attn.v_proj.bias', 'vision_model.encoder.layers.10.self_attn.q_proj.bias', 'vision_model.encoder.layers.15.layer_norm1.weight', 'vision_model.encoder.layers.7.self_attn.v_proj.bias', 'vision_model.encoder.layers.0.mlp.fc2.bias', 'vision_model.encoder.layers.15.self_attn.k_proj.bias', 'vision_model.encoder.layers.4.self_attn.v_proj.bias', 'vision_model.encoder.layers.11.layer_norm2.bias', 'vision_model.encoder.layers.10.layer_norm2.bias', 'visual_projection.weight', 'vision_model.encoder.layers.1.mlp.fc1.weight', 'vision_model.encoder.layers.6.self_attn.v_proj.bias', 'vision_model.encoder.layers.11.self_attn.q_proj.weight', 'vision_model.encoder.layers.11.self_attn.out_proj.bias', 'vision_model.encoder.layers.0.self_attn.q_proj.weight', 'vision_model.encoder.layers.6.self_attn.out_proj.weight', 'vision_model.encoder.layers.8.self_attn.k_proj.weight', 'vision_model.encoder.layers.2.self_attn.q_proj.bias', 'vision_model.encoder.layers.17.self_attn.k_proj.weight', 'vision_model.encoder.layers.22.self_attn.v_proj.weight', 'vision_model.encoder.layers.23.self_attn.q_proj.weight', 'vision_model.encoder.layers.15.self_attn.k_proj.weight', 'vision_model.encoder.layers.19.layer_norm2.bias', 'vision_model.encoder.layers.7.mlp.fc2.bias', 'vision_model.encoder.layers.23.self_attn.q_proj.bias', 'vision_model.encoder.layers.17.layer_norm2.weight', 'vision_model.encoder.layers.11.layer_norm1.weight', 'vision_model.encoder.layers.22.self_attn.k_proj.bias', 'vision_model.encoder.layers.23.mlp.fc2.bias', 'vision_model.encoder.layers.0.layer_norm2.weight', 'vision_model.encoder.layers.8.layer_norm2.weight', 'vision_model.encoder.layers.8.layer_norm1.bias', 'vision_model.encoder.layers.4.mlp.fc2.weight', 'vision_model.encoder.layers.5.self_attn.out_proj.weight', 'vision_model.encoder.layers.21.layer_norm1.weight', 'vision_model.post_layernorm.weight', 'vision_model.encoder.layers.1.self_attn.out_proj.bias', 'vision_model.encoder.layers.0.self_attn.q_proj.bias', 'vision_model.encoder.layers.21.mlp.fc2.weight', 'vision_model.encoder.layers.18.self_attn.k_proj.bias', 'vision_model.encoder.layers.20.mlp.fc1.weight', 'vision_model.encoder.layers.15.mlp.fc2.weight', 'logit_scale', 'vision_model.encoder.layers.1.self_attn.q_proj.weight', 'vision_model.encoder.layers.5.mlp.fc1.bias', 'vision_model.encoder.layers.16.self_attn.k_proj.bias', 'vision_model.encoder.layers.12.self_attn.out_proj.weight', 'vision_model.encoder.layers.10.self_attn.v_proj.bias', 'vision_model.encoder.layers.11.mlp.fc2.bias', 'vision_model.encoder.layers.12.layer_norm2.weight', 'vision_model.encoder.layers.16.self_attn.q_proj.bias', 'vision_model.encoder.layers.6.self_attn.k_proj.weight', 'vision_model.encoder.layers.5.mlp.fc1.weight', 'vision_model.encoder.layers.7.mlp.fc2.weight', 'vision_model.encoder.layers.4.self_attn.out_proj.weight', 'vision_model.encoder.layers.16.mlp.fc2.bias', 'vision_model.encoder.layers.1.layer_norm2.bias', 'vision_model.encoder.layers.13.self_attn.k_proj.bias', 'vision_model.encoder.layers.21.layer_norm1.bias', 'vision_model.encoder.layers.17.mlp.fc1.bias', 'vision_model.encoder.layers.5.layer_norm2.weight', 'vision_model.encoder.layers.8.layer_norm1.weight', 'vision_model.encoder.layers.11.mlp.fc1.weight', 'vision_model.encoder.layers.19.mlp.fc1.bias', 'vision_model.encoder.layers.0.self_attn.k_proj.weight', 'vision_model.encoder.layers.10.self_attn.out_proj.bias', 'vision_model.encoder.layers.13.self_attn.q_proj.bias', 'vision_model.encoder.layers.7.self_attn.out_proj.weight', 'vision_model.encoder.layers.4.self_attn.q_proj.bias', 'vision_model.encoder.layers.22.mlp.fc1.weight', 'vision_model.encoder.layers.20.mlp.fc2.weight', 'vision_model.encoder.layers.20.layer_norm2.bias', 'vision_model.encoder.layers.13.layer_norm1.weight', 'vision_model.encoder.layers.4.layer_norm1.weight', 'vision_model.encoder.layers.6.self_attn.out_proj.bias', 'vision_model.encoder.layers.9.self_attn.out_proj.weight', 'vision_model.encoder.layers.14.mlp.fc2.weight', 'vision_model.encoder.layers.9.self_attn.k_proj.weight', 'vision_model.encoder.layers.18.self_attn.out_proj.bias', 'vision_model.encoder.layers.11.self_attn.v_proj.bias', 'vision_model.encoder.layers.13.self_attn.out_proj.weight', 'vision_model.encoder.layers.19.mlp.fc1.weight', 'vision_model.encoder.layers.16.layer_norm1.weight', 'vision_model.embeddings.position_ids', 'vision_model.encoder.layers.3.self_attn.out_proj.bias', 'vision_model.encoder.layers.20.self_attn.k_proj.weight', 'vision_model.encoder.layers.20.layer_norm2.weight', 'vision_model.encoder.layers.22.mlp.fc2.weight', 'vision_model.encoder.layers.14.self_attn.k_proj.bias', 'vision_model.encoder.layers.15.self_attn.q_proj.weight', 'vision_model.encoder.layers.19.mlp.fc2.bias', 'vision_model.encoder.layers.4.mlp.fc1.weight', 'vision_model.encoder.layers.20.self_attn.q_proj.weight', 'vision_model.encoder.layers.1.layer_norm1.weight', 'vision_model.encoder.layers.8.self_attn.out_proj.weight', 'vision_model.encoder.layers.0.layer_norm1.bias', 'vision_model.encoder.layers.3.layer_norm1.bias', 'vision_model.encoder.layers.7.mlp.fc1.bias', 'vision_model.pre_layrnorm.weight', 'vision_model.encoder.layers.1.layer_norm1.bias', 'vision_model.encoder.layers.5.layer_norm2.bias', 'vision_model.encoder.layers.20.mlp.fc2.bias', 'vision_model.encoder.layers.0.self_attn.v_proj.weight', 'vision_model.encoder.layers.0.layer_norm2.bias', 'vision_model.encoder.layers.15.self_attn.q_proj.bias', 'vision_model.encoder.layers.19.layer_norm2.weight', 'vision_model.encoder.layers.4.self_attn.v_proj.weight', 'vision_model.encoder.layers.13.layer_norm1.bias', 'vision_model.encoder.layers.13.layer_norm2.weight', 'vision_model.encoder.layers.2.self_attn.k_proj.bias', 'vision_model.encoder.layers.12.self_attn.q_proj.bias', 'vision_model.encoder.layers.14.layer_norm2.bias', 'vision_model.encoder.layers.23.mlp.fc1.bias', 'vision_model.encoder.layers.1.mlp.fc2.bias', 'vision_model.encoder.layers.22.self_attn.k_proj.weight', 'vision_model.encoder.layers.7.layer_norm1.bias', 'vision_model.encoder.layers.10.mlp.fc2.weight', 'vision_model.encoder.layers.14.self_attn.v_proj.bias', 'vision_model.encoder.layers.8.self_attn.q_proj.weight', 'vision_model.encoder.layers.19.self_attn.k_proj.weight', 'vision_model.pre_layrnorm.bias', 'vision_model.encoder.layers.16.mlp.fc1.bias', 'vision_model.encoder.layers.18.mlp.fc1.bias', 'vision_model.encoder.layers.23.self_attn.out_proj.weight', 'vision_model.embeddings.position_embedding.weight', 'vision_model.encoder.layers.6.mlp.fc1.weight', 'vision_model.encoder.layers.13.layer_norm2.bias', 'vision_model.encoder.layers.16.self_attn.v_proj.bias', 'vision_model.encoder.layers.18.mlp.fc1.weight', 'vision_model.encoder.layers.13.mlp.fc1.bias', 'vision_model.encoder.layers.14.layer_norm1.bias', 'vision_model.encoder.layers.21.mlp.fc2.bias', 'vision_model.encoder.layers.12.self_attn.k_proj.weight', 'vision_model.encoder.layers.7.mlp.fc1.weight', 'vision_model.encoder.layers.16.layer_norm2.weight', 'vision_model.encoder.layers.2.self_attn.out_proj.bias', 'vision_model.encoder.layers.6.layer_norm2.weight', 'vision_model.encoder.layers.17.mlp.fc1.weight', 'vision_model.encoder.layers.10.self_attn.k_proj.weight', 'vision_model.post_layernorm.bias', 'vision_model.encoder.layers.21.self_attn.q_proj.bias', 'vision_model.encoder.layers.3.layer_norm1.weight', 'vision_model.encoder.layers.6.self_attn.q_proj.bias', 'vision_model.encoder.layers.6.mlp.fc1.bias', 'vision_model.encoder.layers.9.mlp.fc2.bias', 'vision_model.encoder.layers.23.layer_norm2.weight', 'vision_model.encoder.layers.3.self_attn.k_proj.weight', 'vision_model.encoder.layers.2.self_attn.q_proj.weight', 'vision_model.encoder.layers.16.mlp.fc1.weight', 'vision_model.encoder.layers.4.mlp.fc2.bias', 'vision_model.encoder.layers.9.layer_norm2.weight', 'vision_model.encoder.layers.12.self_attn.q_proj.weight', 'vision_model.encoder.layers.8.self_attn.out_proj.bias', 'vision_model.encoder.layers.7.self_attn.q_proj.bias', 'vision_model.encoder.layers.15.mlp.fc1.weight', 'vision_model.encoder.layers.9.layer_norm1.weight', 'vision_model.encoder.layers.19.self_attn.out_proj.bias', 'vision_model.encoder.layers.8.self_attn.q_proj.bias', 'vision_model.encoder.layers.0.mlp.fc2.weight', 'vision_model.encoder.layers.8.mlp.fc1.weight', 'vision_model.encoder.layers.5.self_attn.k_proj.bias', 'vision_model.encoder.layers.14.self_attn.q_proj.bias', 'vision_model.encoder.layers.18.self_attn.out_proj.weight', 'text_projection.weight', 'vision_model.encoder.layers.6.self_attn.v_proj.weight', 'vision_model.embeddings.patch_embedding.weight', 'vision_model.encoder.layers.9.self_attn.k_proj.bias', 'vision_model.encoder.layers.20.self_attn.out_proj.bias', 'vision_model.encoder.layers.6.self_attn.k_proj.bias', 'vision_model.encoder.layers.14.mlp.fc1.bias', 'vision_model.encoder.layers.11.layer_norm1.bias', 'vision_model.encoder.layers.0.layer_norm1.weight', 'vision_model.encoder.layers.1.self_attn.out_proj.weight', 'vision_model.encoder.layers.8.mlp.fc2.weight', 'vision_model.encoder.layers.3.self_attn.out_proj.weight', 'vision_model.encoder.layers.14.layer_norm1.weight', 'vision_model.encoder.layers.8.self_attn.k_proj.bias', 'vision_model.encoder.layers.9.self_attn.q_proj.bias', 'vision_model.encoder.layers.13.mlp.fc2.bias', 'vision_model.encoder.layers.5.self_attn.q_proj.weight', 'vision_model.encoder.layers.22.mlp.fc1.bias', 'vision_model.encoder.layers.9.mlp.fc1.weight', 'vision_model.encoder.layers.3.mlp.fc2.bias', 'vision_model.encoder.layers.6.mlp.fc2.bias', 'vision_model.encoder.layers.14.self_attn.v_proj.weight', 'vision_model.encoder.layers.19.self_attn.q_proj.bias', 'vision_model.encoder.layers.14.self_attn.out_proj.bias', 'vision_model.encoder.layers.21.self_attn.v_proj.bias', 'vision_model.encoder.layers.17.self_attn.out_proj.bias']
- This IS expected if you are initializing CLIPTextModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing CLIPTextModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Creating invisible watermark encoder (see https://github.com/ShieldMnt/invisible-watermark)...
Sampling: 0%| | 0/2 [00:00<?, ?it/sData shape for PLMS sampling is (3, 4, 64, 64) | 0/1 [00:00<?, ?it/s]
Running PLMS Sampling with 50 timesteps
PLMS Sampler: 0%| | 0/50 [00:06<?, ?it/s]
data: 0%| | 0/1 [00:46<?, ?it/s]
Sampling: 0%| | 0/2 [00:46<?, ?it/s]
Traceback (most recent call last):
File "/Users/redstar16/project/akaboshinit/clone/stable-diffusion/scripts/txt2img.py", line 355, in <module>
main()
File "/Users/redstar16/project/akaboshinit/clone/stable-diffusion/scripts/txt2img.py", line 306, in main
samples_ddim, _ = sampler.sample(S=opt.ddim_steps,
File "/Users/redstar16/.asdf/installs/python/miniforge3-4.10.1-5/envs/ldm/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
return func(*args, **kwargs)
File "/Users/redstar16/project/akaboshinit/clone/stable-diffusion/ldm/models/diffusion/plms.py", line 103, in sample
samples, intermediates = self.plms_sampling(conditioning, size,
File "/Users/redstar16/.asdf/installs/python/miniforge3-4.10.1-5/envs/ldm/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
return func(*args, **kwargs)
File "/Users/redstar16/project/akaboshinit/clone/stable-diffusion/ldm/models/diffusion/plms.py", line 158, in plms_sampling
outs = self.p_sample_plms(img, cond, ts, index=index, use_original_steps=ddim_use_original_steps,
File "/Users/redstar16/.asdf/installs/python/miniforge3-4.10.1-5/envs/ldm/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
return func(*args, **kwargs)
File "/Users/redstar16/project/akaboshinit/clone/stable-diffusion/ldm/models/diffusion/plms.py", line 224, in p_sample_plms
e_t = get_model_output(x, t)
File "/Users/redstar16/project/akaboshinit/clone/stable-diffusion/ldm/models/diffusion/plms.py", line 191, in get_model_output
e_t_uncond, e_t = self.model.apply_model(x_in, t_in, c_in).chunk(2)
File "/Users/redstar16/project/akaboshinit/clone/stable-diffusion/ldm/models/diffusion/ddpm.py", line 987, in apply_model
x_recon = self.model(x_noisy, t, **cond)
File "/Users/redstar16/.asdf/installs/python/miniforge3-4.10.1-5/envs/ldm/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1190, in _call_impl
return forward_call(*input, **kwargs)
File "/Users/redstar16/project/akaboshinit/clone/stable-diffusion/ldm/models/diffusion/ddpm.py", line 1410, in forward
out = self.diffusion_model(x, t, context=cc)
File "/Users/redstar16/.asdf/installs/python/miniforge3-4.10.1-5/envs/ldm/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1190, in _call_impl
return forward_call(*input, **kwargs)
File "/Users/redstar16/project/akaboshinit/clone/stable-diffusion/ldm/modules/diffusionmodules/openaimodel.py", line 732, in forward
h = module(h, emb, context)
File "/Users/redstar16/.asdf/installs/python/miniforge3-4.10.1-5/envs/ldm/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1190, in _call_impl
return forward_call(*input, **kwargs)
File "/Users/redstar16/project/akaboshinit/clone/stable-diffusion/ldm/modules/diffusionmodules/openaimodel.py", line 85, in forward
x = layer(x, context)
File "/Users/redstar16/.asdf/installs/python/miniforge3-4.10.1-5/envs/ldm/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1190, in _call_impl
return forward_call(*input, **kwargs)
File "/Users/redstar16/project/akaboshinit/clone/stable-diffusion/ldm/modules/attention.py", line 254, in forward
x = self.norm(x)
File "/Users/redstar16/.asdf/installs/python/miniforge3-4.10.1-5/envs/ldm/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1190, in _call_impl
return forward_call(*input, **kwargs)
File "/Users/redstar16/.asdf/installs/python/miniforge3-4.10.1-5/envs/ldm/lib/python3.10/site-packages/torch/nn/modules/normalization.py", line 273, in forward
return F.group_norm(
File "/Users/redstar16/.asdf/installs/python/miniforge3-4.10.1-5/envs/ldm/lib/python3.10/site-packages/torch/nn/functional.py", line 2526, in group_norm
return torch.group_norm(input, num_groups, weight, bias, eps, torch.backends.cudnn.enabled)
RuntimeError: expected scalar type BFloat16 but found Float
@akaboshinit
RuntimeError: expected scalar type BFloat16 but found Float
this happened to me when I built PyTorch from source from Raymonf's fork of kulinseth's branch.
fixed it by upgrading PyTorch to a latest nightly release.
@Birch-san Thanks for the reply. I thought I'd done it myself just in case, but it's the one below, right?
cmd:
conda install pytorch torchvision torchaudio -c pytorch-nightly
version:
% python3 -c 'import torch; print(torch.__version__) '
1.13.0.dev20220823
yes. that's what I get too.
(ldm) ➜ stable-diffusion git:(apple-silicon-mps-support) ✗ python3 -c 'import torch; print(torch.__version__) '
1.13.0.dev20220823
I'm not sure then what the problem could be. you've activated the right conda env right? your failed python script was running with the same environment as the one which is printing this torch version?
and you're running from this branch?
https://github.com/magnusviri/stable-diffusion/tree/apple-silicon-mps-support
@magnusviri The instructions you provided somehow doesn't work in my environment.
Traceback (most recent call last):
File "/Users/username/Desktop/stable-diffusion/scripts/txt2img.py", line 8, in <module>
from imwatermark import WatermarkEncoder
ModuleNotFoundError: No module named 'imwatermark'
pip install imwatermark
returns that it's already satisfiedpip install invisible-watermark
returns error, since onnx doesn't support python 3.10 yet. (at least can build from source)Is there any reason to change to python 3.10? And I'm also curious how you got imwatermark functional.
Thank you for your work
I switched to python 3.10 because it was the first thing I tried that solved some conda conflicts. It's probably possible to resolve them some other way without changing the python version
I had exactly the same issue yesterday and managed to fix it while staying on 3.10.4.
The solution is not to try and install onnx / onnxruntime manually, which I was doing after getting the error.
I not entirely sure anymore but I think these commands fixed it for me pip install transformers==4.19.2 diffusers invisible-watermark
and/or conda install pytorch torchvision -c pytorch
@magnusviri The instructions you provided somehow doesn't work in my environment.
Traceback (most recent call last): File "/Users/username/Desktop/stable-diffusion/scripts/txt2img.py", line 8, in <module> from imwatermark import WatermarkEncoder ModuleNotFoundError: No module named 'imwatermark'
pip install imwatermark
returns that it's already satisfiedpip install invisible-watermark
returns error, since onnx doesn't support python 3.10 yet. (at least can build from source)Is there any reason to change to python 3.10? And I'm also curious how you got imwatermark functional.
Thank you for your work
@junukwon7
@magnusviri The instructions you provided somehow doesn't work in my environment.
Traceback (most recent call last): File "/Users/username/Desktop/stable-diffusion/scripts/txt2img.py", line 8, in <module> from imwatermark import WatermarkEncoder ModuleNotFoundError: No module named 'imwatermark'
oh yeah I forgot to mention. this happened to me too (including the problems trying to build onnx
from source).
I just modified txt2img.py
and commented-out anything related to WatermarkEncoder
. because I can live without watermarks on my images 😛
Error: product of dimension sizes > 2**31
@junukwon7 This also happens to me if I set the resolution too high for the amount of video memory on my machine.
Setting --n_samples 1
reduces the memory usage of txt2img
so there might be a chance that you can handle a higher resolution using that setting.
Finally got why I got errors which didn't happen before! Yesterday I cloned einanao's repo and it had no watermark, since the watermark was added two days ago. However, magnusviri's repo has it. @Birch-san your solution looks both legit and simple :)
Error: product of dimension sizes > 2**31
@junukwon7 This also happens to me if I set the resolution too high for the amount of video memory on my machine.
Setting
--n_samples 1
reduces the memory usage oftxt2img
so there might be a chance that you can handle a higher resolution using that setting.
@cgodley @Birch-san Thanks for the reply. However I've already set my n_samples to 1...
EDIT: Looks like the maximum vram for mps is not 64GB - somewhere around 16? It also makes sense that insufficient vram lead to the exception. Would need some more research.
Also, since my vram has much space (64G) while the process uses 14GB, and the error states that some number exceeded 2**31 which is INT_MAX, I think it's some kind of a soft-limit which means having more vram won't help :(
sounds like the limit comes from the fact that something's being computed in 32-bit?
Hey everyone, following the steps in @cgodley 's comment and some additions in @HenkPoley 's comments. After doing that, I'm getting the following error when I finally give the prompt: zsh: segmentation fault python scripts/txt2img.py --prompt
Any suggestions please?
Hey everyone, following the steps in @cgodley 's comment and some additions in @HenkPoley 's comments. After doing that, I'm getting the following error when I finally give the prompt: zsh: segmentation fault python scripts/txt2img.py --prompt
Any suggestions please?
@saibakatar Could you provide full error log and your command?
Hey everyone, following the steps in @cgodley 's comment and some additions in @HenkPoley 's comments. After doing that, I'm getting the following error when I finally give the prompt: zsh: segmentation fault python scripts/txt2img.py --prompt Any suggestions please?
@saibakatar Could you provide full error log and your command?
Hi @junukwon7 , it's as follows:
stable-diffusion % python scripts/txt2img.py --prompt "a photograph of an astronaut riding a horse" --plms --seed 222
zsh: segmentation fault python scripts/txt2img.py --prompt --plms --seed 222
@saibakatar Although I'm not sure what triggered the segmentation fault, I can still provide some general solutions.
--n_samples 1 --n_rows 1 --n_iter 1
. These flags will reduce memory consumption.Hope these would help.
2. --n_samples 1 --n_rows 1 --n_iter 1
Thanks. The issue continued after including the flags above. My RAM on M1 is 64 GB so probably not that. Will just try again from scratch I guess. Thanks again.
@junukwon7 help me... How do I fix it?
(ldm) redstar16: ~/project/akaboshinit/clone/stable-diffusion (apple-silicon-mps-support|✚1…)
% python scripts/txt2img.py --prompt "a photograph of an astronaut riding a horse" --plms --n_samples 1 --n_rows 1 --n_iter 1
Global seed set to 42
Loading model from models/ldm/stable-diffusion-v1/model.ckpt
Global Step: 470000
LatentDiffusion: Running in eps-prediction mode
DiffusionWrapper has 859.52 M params.
making attention of type 'vanilla' with 512 in_channels
Working with z of shape (1, 4, 32, 32) = 4096 dimensions.
making attention of type 'vanilla' with 512 in_channels
Some weights of the model checkpoint at openai/clip-vit-large-patch14 were not used when initializing CLIPTextModel: ['vision_model.encoder.layers.10.self_attn.k_proj.weight', 'vision_model.encoder.layers.6.layer_norm1.bias', 'vision_model.encoder.layers.13.layer_norm1.weight', 'vision_model.encoder.layers.22.self_attn.k_proj.bias', 'vision_model.encoder.layers.12.self_attn.out_proj.weight', 'vision_model.encoder.layers.22.mlp.fc1.bias', 'vision_model.encoder.layers.5.mlp.fc1.weight', 'vision_model.encoder.layers.11.self_attn.v_proj.bias', 'vision_model.encoder.layers.23.self_attn.v_proj.bias', 'vision_model.encoder.layers.19.mlp.fc1.bias', 'vision_model.encoder.layers.6.self_attn.q_proj.bias', 'vision_model.encoder.layers.18.mlp.fc2.weight', 'vision_model.encoder.layers.11.self_attn.out_proj.weight', 'vision_model.encoder.layers.12.mlp.fc2.bias', 'vision_model.encoder.layers.16.layer_norm1.bias', 'vision_model.encoder.layers.19.self_attn.v_proj.weight', 'vision_model.encoder.layers.16.self_attn.v_proj.weight', 'vision_model.encoder.layers.0.self_attn.v_proj.weight', 'vision_model.encoder.layers.1.self_attn.out_proj.weight', 'vision_model.encoder.layers.0.layer_norm2.weight', 'vision_model.encoder.layers.17.mlp.fc2.weight', 'vision_model.encoder.layers.14.self_attn.v_proj.weight', 'vision_model.encoder.layers.22.layer_norm1.bias', 'vision_model.encoder.layers.23.mlp.fc2.weight', 'vision_model.encoder.layers.14.self_attn.v_proj.bias', 'vision_model.encoder.layers.20.layer_norm2.bias', 'vision_model.encoder.layers.0.self_attn.out_proj.weight', 'vision_model.encoder.layers.7.self_attn.q_proj.weight', 'vision_model.encoder.layers.4.self_attn.k_proj.bias', 'vision_model.encoder.layers.20.self_attn.q_proj.bias', 'vision_model.encoder.layers.6.mlp.fc2.bias', 'vision_model.encoder.layers.17.self_attn.q_proj.bias', 'vision_model.encoder.layers.7.self_attn.k_proj.bias', 'vision_model.encoder.layers.18.self_attn.k_proj.bias', 'vision_model.encoder.layers.2.layer_norm2.bias', 'vision_model.encoder.layers.1.self_attn.out_proj.bias', 'vision_model.encoder.layers.7.mlp.fc2.weight', 'vision_model.encoder.layers.10.layer_norm2.weight', 'vision_model.encoder.layers.4.layer_norm2.bias', 'vision_model.encoder.layers.0.layer_norm1.weight', 'vision_model.encoder.layers.19.mlp.fc1.weight', 'vision_model.encoder.layers.11.layer_norm1.weight', 'vision_model.encoder.layers.15.mlp.fc1.weight', 'vision_model.encoder.layers.19.layer_norm2.weight', 'vision_model.encoder.layers.10.self_attn.v_proj.weight', 'vision_model.encoder.layers.8.layer_norm2.weight', 'vision_model.encoder.layers.9.self_attn.v_proj.bias', 'vision_model.encoder.layers.6.mlp.fc1.bias', 'vision_model.encoder.layers.22.layer_norm2.bias', 'vision_model.encoder.layers.5.self_attn.k_proj.bias', 'vision_model.encoder.layers.15.self_attn.k_proj.weight', 'vision_model.encoder.layers.21.self_attn.v_proj.weight', 'vision_model.encoder.layers.5.layer_norm2.bias', 'vision_model.encoder.layers.3.layer_norm1.bias', 'vision_model.encoder.layers.20.self_attn.k_proj.bias', 'vision_model.encoder.layers.2.self_attn.q_proj.weight', 'vision_model.encoder.layers.4.self_attn.q_proj.weight', 'vision_model.encoder.layers.18.self_attn.v_proj.weight', 'vision_model.encoder.layers.7.self_attn.k_proj.weight', 'vision_model.encoder.layers.18.self_attn.q_proj.weight', 'vision_model.encoder.layers.23.self_attn.out_proj.weight', 'vision_model.encoder.layers.6.self_attn.q_proj.weight', 'vision_model.encoder.layers.21.self_attn.q_proj.weight', 'vision_model.encoder.layers.12.mlp.fc1.weight', 'vision_model.encoder.layers.15.self_attn.q_proj.weight', 'vision_model.encoder.layers.3.self_attn.out_proj.weight', 'vision_model.encoder.layers.11.mlp.fc2.weight', 'vision_model.encoder.layers.2.self_attn.out_proj.bias', 'vision_model.encoder.layers.9.mlp.fc1.bias', 'vision_model.encoder.layers.18.mlp.fc2.bias', 'vision_model.encoder.layers.22.mlp.fc2.bias', 'vision_model.encoder.layers.11.layer_norm2.weight', 'vision_model.encoder.layers.3.layer_norm2.bias', 'vision_model.encoder.layers.5.self_attn.q_proj.bias', 'vision_model.encoder.layers.12.self_attn.out_proj.bias', 'vision_model.encoder.layers.7.layer_norm1.bias', 'vision_model.encoder.layers.13.self_attn.v_proj.bias', 'vision_model.encoder.layers.19.self_attn.k_proj.weight', 'vision_model.encoder.layers.17.layer_norm1.bias', 'vision_model.encoder.layers.16.mlp.fc2.bias', 'vision_model.encoder.layers.4.self_attn.k_proj.weight', 'vision_model.encoder.layers.19.self_attn.q_proj.bias', 'vision_model.encoder.layers.3.mlp.fc1.bias', 'vision_model.encoder.layers.10.self_attn.v_proj.bias', 'vision_model.encoder.layers.1.mlp.fc1.weight', 'vision_model.encoder.layers.14.self_attn.q_proj.weight', 'vision_model.encoder.layers.0.self_attn.q_proj.bias', 'vision_model.encoder.layers.14.self_attn.k_proj.weight', 'vision_model.encoder.layers.1.mlp.fc1.bias', 'vision_model.encoder.layers.23.layer_norm1.weight', 'vision_model.encoder.layers.8.mlp.fc1.weight', 'vision_model.encoder.layers.9.mlp.fc1.weight', 'vision_model.encoder.layers.18.mlp.fc1.weight', 'vision_model.encoder.layers.22.mlp.fc2.weight', 'vision_model.post_layernorm.bias', 'vision_model.encoder.layers.2.mlp.fc2.weight', 'vision_model.encoder.layers.21.mlp.fc1.weight', 'vision_model.encoder.layers.2.self_attn.v_proj.weight', 'vision_model.encoder.layers.22.self_attn.k_proj.weight', 'vision_model.encoder.layers.23.mlp.fc2.bias', 'vision_model.encoder.layers.20.layer_norm2.weight', 'vision_model.encoder.layers.13.mlp.fc1.bias', 'vision_model.encoder.layers.16.self_attn.q_proj.bias', 'vision_model.encoder.layers.8.self_attn.v_proj.bias', 'vision_model.encoder.layers.20.mlp.fc2.weight', 'vision_model.encoder.layers.22.self_attn.out_proj.weight', 'vision_model.encoder.layers.4.layer_norm1.weight', 'vision_model.encoder.layers.2.self_attn.out_proj.weight', 'vision_model.encoder.layers.10.self_attn.out_proj.bias', 'vision_model.encoder.layers.16.layer_norm2.weight', 'vision_model.encoder.layers.4.mlp.fc2.weight', 'vision_model.encoder.layers.19.self_attn.out_proj.weight', 'vision_model.encoder.layers.1.layer_norm2.weight', 'vision_model.encoder.layers.17.self_attn.v_proj.weight', 'vision_model.encoder.layers.0.mlp.fc1.weight', 'vision_model.encoder.layers.21.self_attn.v_proj.bias', 'vision_model.encoder.layers.6.mlp.fc2.weight', 'vision_model.encoder.layers.14.self_attn.out_proj.bias', 'vision_model.encoder.layers.17.self_attn.q_proj.weight', 'vision_model.encoder.layers.17.layer_norm2.weight', 'vision_model.encoder.layers.3.mlp.fc2.bias', 'vision_model.encoder.layers.20.self_attn.out_proj.weight', 'vision_model.encoder.layers.17.self_attn.k_proj.bias', 'vision_model.encoder.layers.19.layer_norm2.bias', 'vision_model.encoder.layers.4.self_attn.v_proj.bias', 'vision_model.encoder.layers.14.self_attn.out_proj.weight', 'vision_model.embeddings.class_embedding', 'vision_model.encoder.layers.7.mlp.fc1.bias', 'vision_model.encoder.layers.5.mlp.fc2.weight', 'vision_model.encoder.layers.7.layer_norm2.weight', 'vision_model.encoder.layers.11.self_attn.k_proj.bias', 'vision_model.encoder.layers.17.layer_norm1.weight', 'vision_model.encoder.layers.15.self_attn.v_proj.weight', 'vision_model.encoder.layers.10.mlp.fc2.weight', 'vision_model.encoder.layers.22.mlp.fc1.weight', 'vision_model.encoder.layers.13.self_attn.k_proj.bias', 'vision_model.encoder.layers.17.mlp.fc1.bias', 'vision_model.encoder.layers.8.self_attn.q_proj.weight', 'vision_model.encoder.layers.21.mlp.fc1.bias', 'vision_model.encoder.layers.10.self_attn.q_proj.weight', 'vision_model.encoder.layers.18.self_attn.out_proj.bias', 'vision_model.encoder.layers.0.self_attn.k_proj.weight', 'vision_model.encoder.layers.8.self_attn.q_proj.bias', 'vision_model.encoder.layers.2.layer_norm1.weight', 'vision_model.encoder.layers.18.layer_norm2.bias', 'vision_model.encoder.layers.19.self_attn.q_proj.weight', 'vision_model.encoder.layers.20.self_attn.v_proj.bias', 'vision_model.encoder.layers.22.self_attn.v_proj.weight', 'vision_model.encoder.layers.15.layer_norm1.weight', 'vision_model.encoder.layers.9.layer_norm2.weight', 'vision_model.encoder.layers.18.mlp.fc1.bias', 'vision_model.encoder.layers.15.self_attn.out_proj.weight', 'vision_model.encoder.layers.12.self_attn.q_proj.weight', 'vision_model.encoder.layers.22.layer_norm2.weight', 'vision_model.encoder.layers.14.self_attn.q_proj.bias', 'vision_model.encoder.layers.20.mlp.fc2.bias', 'vision_model.encoder.layers.11.self_attn.q_proj.bias', 'vision_model.encoder.layers.21.self_attn.q_proj.bias', 'vision_model.encoder.layers.2.self_attn.k_proj.bias', 'vision_model.encoder.layers.6.layer_norm2.weight', 'vision_model.encoder.layers.5.self_attn.v_proj.bias', 'vision_model.encoder.layers.4.self_attn.q_proj.bias', 'vision_model.encoder.layers.11.self_attn.v_proj.weight', 'vision_model.encoder.layers.17.self_attn.out_proj.weight', 'vision_model.encoder.layers.13.self_attn.out_proj.bias', 'vision_model.encoder.layers.0.self_attn.out_proj.bias', 'vision_model.encoder.layers.7.self_attn.v_proj.bias', 'vision_model.encoder.layers.23.layer_norm2.weight', 'vision_model.encoder.layers.3.self_attn.k_proj.weight', 'vision_model.encoder.layers.10.mlp.fc1.weight', 'vision_model.encoder.layers.5.mlp.fc1.bias', 'vision_model.encoder.layers.4.mlp.fc1.weight', 'vision_model.embeddings.position_ids', 'vision_model.encoder.layers.11.layer_norm2.bias', 'vision_model.pre_layrnorm.weight', 'vision_model.encoder.layers.7.mlp.fc2.bias', 'vision_model.encoder.layers.19.layer_norm1.weight', 'vision_model.encoder.layers.4.self_attn.v_proj.weight', 'vision_model.encoder.layers.13.self_attn.q_proj.bias', 'vision_model.encoder.layers.21.layer_norm2.weight', 'vision_model.encoder.layers.2.self_attn.q_proj.bias', 'vision_model.encoder.layers.8.mlp.fc1.bias', 'vision_model.encoder.layers.5.self_attn.q_proj.weight', 'vision_model.encoder.layers.9.layer_norm1.bias', 'vision_model.encoder.layers.3.self_attn.v_proj.bias', 'vision_model.encoder.layers.1.mlp.fc2.weight', 'vision_model.pre_layrnorm.bias', 'vision_model.encoder.layers.14.layer_norm2.weight', 'vision_model.encoder.layers.20.layer_norm1.weight', 'visual_projection.weight', 'vision_model.post_layernorm.weight', 'vision_model.encoder.layers.11.mlp.fc1.bias', 'vision_model.encoder.layers.11.self_attn.k_proj.weight', 'logit_scale', 'vision_model.encoder.layers.19.self_attn.k_proj.bias', 'vision_model.encoder.layers.1.self_attn.k_proj.bias', 'vision_model.encoder.layers.8.mlp.fc2.weight', 'vision_model.encoder.layers.0.layer_norm2.bias', 'vision_model.encoder.layers.12.layer_norm1.weight', 'vision_model.encoder.layers.3.self_attn.out_proj.bias', 'vision_model.encoder.layers.4.self_attn.out_proj.bias', 'vision_model.encoder.layers.6.self_attn.v_proj.weight', 'vision_model.encoder.layers.6.self_attn.out_proj.weight', 'vision_model.encoder.layers.16.self_attn.v_proj.bias', 'vision_model.encoder.layers.20.self_attn.out_proj.bias', 'vision_model.embeddings.position_embedding.weight', 'vision_model.encoder.layers.1.self_attn.q_proj.weight', 'vision_model.encoder.layers.21.layer_norm2.bias', 'vision_model.encoder.layers.9.self_attn.k_proj.bias', 'vision_model.encoder.layers.15.layer_norm2.bias', 'vision_model.encoder.layers.8.self_attn.k_proj.weight', 'vision_model.encoder.layers.22.self_attn.out_proj.bias', 'vision_model.encoder.layers.7.self_attn.q_proj.bias', 'vision_model.encoder.layers.23.mlp.fc1.bias', 'vision_model.encoder.layers.20.layer_norm1.bias', 'vision_model.encoder.layers.11.mlp.fc2.bias', 'vision_model.encoder.layers.23.self_attn.q_proj.weight', 'vision_model.encoder.layers.3.layer_norm2.weight', 'vision_model.encoder.layers.19.self_attn.v_proj.bias', 'vision_model.encoder.layers.15.self_attn.out_proj.bias', 'vision_model.encoder.layers.21.self_attn.out_proj.bias', 'vision_model.embeddings.patch_embedding.weight', 'vision_model.encoder.layers.15.self_attn.k_proj.bias', 'vision_model.encoder.layers.8.layer_norm2.bias', 'vision_model.encoder.layers.6.mlp.fc1.weight', 'vision_model.encoder.layers.10.mlp.fc1.bias', 'vision_model.encoder.layers.16.mlp.fc2.weight', 'vision_model.encoder.layers.18.self_attn.q_proj.bias', 'vision_model.encoder.layers.21.layer_norm1.bias', 'vision_model.encoder.layers.9.self_attn.out_proj.bias', 'vision_model.encoder.layers.16.self_attn.out_proj.weight', 'vision_model.encoder.layers.9.self_attn.q_proj.weight', 'vision_model.encoder.layers.7.layer_norm2.bias', 'vision_model.encoder.layers.7.mlp.fc1.weight', 'vision_model.encoder.layers.14.layer_norm2.bias', 'vision_model.encoder.layers.9.self_attn.q_proj.bias', 'vision_model.encoder.layers.21.mlp.fc2.weight', 'vision_model.encoder.layers.3.self_attn.k_proj.bias', 'vision_model.encoder.layers.16.self_attn.k_proj.weight', 'vision_model.encoder.layers.0.mlp.fc2.weight', 'vision_model.encoder.layers.9.self_attn.v_proj.weight', 'vision_model.encoder.layers.13.self_attn.v_proj.weight', 'vision_model.encoder.layers.17.mlp.fc2.bias', 'vision_model.encoder.layers.18.self_attn.k_proj.weight', 'vision_model.encoder.layers.13.mlp.fc2.weight', 'vision_model.encoder.layers.2.layer_norm2.weight', 'vision_model.encoder.layers.1.self_attn.v_proj.weight', 'vision_model.encoder.layers.12.self_attn.v_proj.bias', 'vision_model.encoder.layers.20.self_attn.k_proj.weight', 'vision_model.encoder.layers.2.mlp.fc2.bias', 'vision_model.encoder.layers.18.layer_norm1.weight', 'vision_model.encoder.layers.20.mlp.fc1.weight', 'vision_model.encoder.layers.3.mlp.fc1.weight', 'vision_model.encoder.layers.13.layer_norm2.weight', 'vision_model.encoder.layers.11.self_attn.out_proj.bias', 'vision_model.encoder.layers.2.mlp.fc1.weight', 'vision_model.encoder.layers.16.self_attn.k_proj.bias', 'vision_model.encoder.layers.15.mlp.fc1.bias', 'vision_model.encoder.layers.5.self_attn.out_proj.weight', 'vision_model.encoder.layers.1.mlp.fc2.bias', 'vision_model.encoder.layers.0.self_attn.k_proj.bias', 'vision_model.encoder.layers.7.self_attn.v_proj.weight', 'vision_model.encoder.layers.18.layer_norm1.bias', 'vision_model.encoder.layers.16.self_attn.out_proj.bias', 'vision_model.encoder.layers.5.layer_norm1.bias', 'text_projection.weight', 'vision_model.encoder.layers.7.layer_norm1.weight', 'vision_model.encoder.layers.23.self_attn.v_proj.weight', 'vision_model.encoder.layers.4.mlp.fc2.bias', 'vision_model.encoder.layers.12.mlp.fc1.bias', 'vision_model.encoder.layers.20.self_attn.v_proj.weight', 'vision_model.encoder.layers.8.layer_norm1.weight', 'vision_model.encoder.layers.22.self_attn.q_proj.bias', 'vision_model.encoder.layers.9.layer_norm2.bias', 'vision_model.encoder.layers.0.self_attn.v_proj.bias', 'vision_model.encoder.layers.2.self_attn.k_proj.weight', 'vision_model.encoder.layers.12.self_attn.q_proj.bias', 'vision_model.encoder.layers.1.layer_norm2.bias', 'vision_model.encoder.layers.22.self_attn.q_proj.weight', 'vision_model.encoder.layers.16.mlp.fc1.weight', 'vision_model.encoder.layers.21.self_attn.k_proj.bias', 'vision_model.encoder.layers.12.mlp.fc2.weight', 'vision_model.encoder.layers.9.mlp.fc2.weight', 'vision_model.encoder.layers.0.layer_norm1.bias', 'vision_model.encoder.layers.8.mlp.fc2.bias', 'vision_model.encoder.layers.13.self_attn.out_proj.weight', 'vision_model.encoder.layers.23.self_attn.k_proj.weight', 'vision_model.encoder.layers.15.self_attn.q_proj.bias', 'vision_model.encoder.layers.16.mlp.fc1.bias', 'vision_model.encoder.layers.0.mlp.fc2.bias', 'vision_model.encoder.layers.9.self_attn.out_proj.weight', 'vision_model.encoder.layers.19.layer_norm1.bias', 'vision_model.encoder.layers.19.mlp.fc2.bias', 'vision_model.encoder.layers.21.self_attn.k_proj.weight', 'vision_model.encoder.layers.16.layer_norm2.bias', 'vision_model.encoder.layers.12.layer_norm1.bias', 'vision_model.encoder.layers.5.self_attn.v_proj.weight', 'vision_model.encoder.layers.16.self_attn.q_proj.weight', 'vision_model.encoder.layers.17.mlp.fc1.weight', 'vision_model.encoder.layers.12.self_attn.k_proj.bias', 'vision_model.encoder.layers.23.mlp.fc1.weight', 'vision_model.encoder.layers.6.self_attn.out_proj.bias', 'vision_model.encoder.layers.0.self_attn.q_proj.weight', 'vision_model.encoder.layers.22.self_attn.v_proj.bias', 'vision_model.encoder.layers.9.mlp.fc2.bias', 'vision_model.encoder.layers.10.layer_norm2.bias', 'vision_model.encoder.layers.5.layer_norm1.weight', 'vision_model.encoder.layers.23.self_attn.k_proj.bias', 'vision_model.encoder.layers.20.self_attn.q_proj.weight', 'vision_model.encoder.layers.9.self_attn.k_proj.weight', 'vision_model.encoder.layers.7.self_attn.out_proj.bias', 'vision_model.encoder.layers.8.self_attn.v_proj.weight', 'vision_model.encoder.layers.8.self_attn.out_proj.weight', 'vision_model.encoder.layers.18.layer_norm2.weight', 'vision_model.encoder.layers.10.mlp.fc2.bias', 'vision_model.encoder.layers.17.self_attn.out_proj.bias', 'vision_model.encoder.layers.14.mlp.fc1.weight', 'vision_model.encoder.layers.3.mlp.fc2.weight', 'vision_model.encoder.layers.8.layer_norm1.bias', 'vision_model.encoder.layers.11.layer_norm1.bias', 'vision_model.encoder.layers.3.self_attn.q_proj.weight', 'vision_model.encoder.layers.4.layer_norm2.weight', 'vision_model.encoder.layers.18.self_attn.v_proj.bias', 'vision_model.encoder.layers.13.mlp.fc1.weight', 'vision_model.encoder.layers.15.mlp.fc2.weight', 'vision_model.encoder.layers.23.self_attn.q_proj.bias', 'vision_model.encoder.layers.6.layer_norm2.bias', 'vision_model.encoder.layers.4.mlp.fc1.bias', 'vision_model.encoder.layers.15.self_attn.v_proj.bias', 'vision_model.encoder.layers.1.self_attn.q_proj.bias', 'vision_model.encoder.layers.12.self_attn.v_proj.weight', 'vision_model.encoder.layers.4.layer_norm1.bias', 'vision_model.encoder.layers.21.self_attn.out_proj.weight', 'vision_model.encoder.layers.19.mlp.fc2.weight', 'vision_model.encoder.layers.6.self_attn.k_proj.weight', 'vision_model.encoder.layers.8.self_attn.k_proj.bias', 'vision_model.encoder.layers.17.self_attn.v_proj.bias', 'vision_model.encoder.layers.6.self_attn.k_proj.bias', 'vision_model.encoder.layers.6.self_attn.v_proj.bias', 'vision_model.encoder.layers.10.self_attn.k_proj.bias', 'vision_model.encoder.layers.23.layer_norm2.bias', 'vision_model.encoder.layers.3.self_attn.q_proj.bias', 'vision_model.encoder.layers.2.mlp.fc1.bias', 'vision_model.encoder.layers.21.layer_norm1.weight', 'vision_model.encoder.layers.2.layer_norm1.bias', 'vision_model.encoder.layers.13.self_attn.q_proj.weight', 'vision_model.encoder.layers.1.layer_norm1.bias', 'vision_model.encoder.layers.11.mlp.fc1.weight', 'vision_model.encoder.layers.5.self_attn.out_proj.bias', 'vision_model.encoder.layers.13.mlp.fc2.bias', 'vision_model.encoder.layers.1.layer_norm1.weight', 'vision_model.encoder.layers.11.self_attn.q_proj.weight', 'vision_model.encoder.layers.5.layer_norm2.weight', 'vision_model.encoder.layers.20.mlp.fc1.bias', 'vision_model.encoder.layers.23.layer_norm1.bias', 'vision_model.encoder.layers.14.mlp.fc2.weight', 'vision_model.encoder.layers.0.mlp.fc1.bias', 'vision_model.encoder.layers.19.self_attn.out_proj.bias', 'vision_model.encoder.layers.7.self_attn.out_proj.weight', 'vision_model.encoder.layers.22.layer_norm1.weight', 'vision_model.encoder.layers.14.mlp.fc1.bias', 'vision_model.encoder.layers.15.layer_norm1.bias', 'vision_model.encoder.layers.14.self_attn.k_proj.bias', 'vision_model.encoder.layers.14.layer_norm1.bias', 'vision_model.encoder.layers.3.self_attn.v_proj.weight', 'vision_model.encoder.layers.12.layer_norm2.bias', 'vision_model.encoder.layers.10.layer_norm1.weight', 'vision_model.encoder.layers.10.self_attn.out_proj.weight', 'vision_model.encoder.layers.4.self_attn.out_proj.weight', 'vision_model.encoder.layers.14.layer_norm1.weight', 'vision_model.encoder.layers.12.self_attn.k_proj.weight', 'vision_model.encoder.layers.1.self_attn.k_proj.weight', 'vision_model.encoder.layers.16.layer_norm1.weight', 'vision_model.encoder.layers.23.self_attn.out_proj.bias', 'vision_model.encoder.layers.1.self_attn.v_proj.bias', 'vision_model.encoder.layers.15.mlp.fc2.bias', 'vision_model.encoder.layers.14.mlp.fc2.bias', 'vision_model.encoder.layers.18.self_attn.out_proj.weight', 'vision_model.encoder.layers.13.layer_norm2.bias', 'vision_model.encoder.layers.9.layer_norm1.weight', 'vision_model.encoder.layers.12.layer_norm2.weight', 'vision_model.encoder.layers.17.self_attn.k_proj.weight', 'vision_model.encoder.layers.13.self_attn.k_proj.weight', 'vision_model.encoder.layers.10.self_attn.q_proj.bias', 'vision_model.encoder.layers.13.layer_norm1.bias', 'vision_model.encoder.layers.10.layer_norm1.bias', 'vision_model.encoder.layers.15.layer_norm2.weight', 'vision_model.encoder.layers.6.layer_norm1.weight', 'vision_model.encoder.layers.8.self_attn.out_proj.bias', 'vision_model.encoder.layers.5.mlp.fc2.bias', 'vision_model.encoder.layers.2.self_attn.v_proj.bias', 'vision_model.encoder.layers.21.mlp.fc2.bias', 'vision_model.encoder.layers.5.self_attn.k_proj.weight', 'vision_model.encoder.layers.17.layer_norm2.bias', 'vision_model.encoder.layers.3.layer_norm1.weight']
- This IS expected if you are initializing CLIPTextModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing CLIPTextModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Creating invisible watermark encoder (see https://github.com/ShieldMnt/invisible-watermark)...
Sampling: 0%| | 0/1 [00:00<?, ?it/sData shape for PLMS sampling is (1, 4, 64, 64) | 0/1 [00:00<?, ?it/s]
Running PLMS Sampling with 50 timesteps
PLMS Sampler: 0%| | 0/50 [00:05<?, ?it/s]
data: 0%| | 0/1 [00:19<?, ?it/s]
Sampling: 0%| | 0/1 [00:19<?, ?it/s]
Traceback (most recent call last):
File "/Users/redstar16/project/akaboshinit/clone/stable-diffusion/scripts/txt2img.py", line 355, in <module>
main()
File "/Users/redstar16/project/akaboshinit/clone/stable-diffusion/scripts/txt2img.py", line 306, in main
samples_ddim, _ = sampler.sample(S=opt.ddim_steps,
File "/Users/redstar16/.asdf/installs/python/miniforge3-4.10.1-5/envs/ldm/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
return func(*args, **kwargs)
File "/Users/redstar16/project/akaboshinit/clone/stable-diffusion/ldm/models/diffusion/plms.py", line 103, in sample
samples, intermediates = self.plms_sampling(conditioning, size,
File "/Users/redstar16/.asdf/installs/python/miniforge3-4.10.1-5/envs/ldm/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
return func(*args, **kwargs)
File "/Users/redstar16/project/akaboshinit/clone/stable-diffusion/ldm/models/diffusion/plms.py", line 158, in plms_sampling
outs = self.p_sample_plms(img, cond, ts, index=index, use_original_steps=ddim_use_original_steps,
File "/Users/redstar16/.asdf/installs/python/miniforge3-4.10.1-5/envs/ldm/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
return func(*args, **kwargs)
File "/Users/redstar16/project/akaboshinit/clone/stable-diffusion/ldm/models/diffusion/plms.py", line 224, in p_sample_plms
e_t = get_model_output(x, t)
File "/Users/redstar16/project/akaboshinit/clone/stable-diffusion/ldm/models/diffusion/plms.py", line 191, in get_model_output
e_t_uncond, e_t = self.model.apply_model(x_in, t_in, c_in).chunk(2)
File "/Users/redstar16/project/akaboshinit/clone/stable-diffusion/ldm/models/diffusion/ddpm.py", line 987, in apply_model
x_recon = self.model(x_noisy, t, **cond)
File "/Users/redstar16/.asdf/installs/python/miniforge3-4.10.1-5/envs/ldm/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1190, in _call_impl
return forward_call(*input, **kwargs)
File "/Users/redstar16/project/akaboshinit/clone/stable-diffusion/ldm/models/diffusion/ddpm.py", line 1410, in forward
out = self.diffusion_model(x, t, context=cc)
File "/Users/redstar16/.asdf/installs/python/miniforge3-4.10.1-5/envs/ldm/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1190, in _call_impl
return forward_call(*input, **kwargs)
File "/Users/redstar16/project/akaboshinit/clone/stable-diffusion/ldm/modules/diffusionmodules/openaimodel.py", line 732, in forward
h = module(h, emb, context)
File "/Users/redstar16/.asdf/installs/python/miniforge3-4.10.1-5/envs/ldm/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1190, in _call_impl
return forward_call(*input, **kwargs)
File "/Users/redstar16/project/akaboshinit/clone/stable-diffusion/ldm/modules/diffusionmodules/openaimodel.py", line 85, in forward
x = layer(x, context)
File "/Users/redstar16/.asdf/installs/python/miniforge3-4.10.1-5/envs/ldm/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1190, in _call_impl
return forward_call(*input, **kwargs)
File "/Users/redstar16/project/akaboshinit/clone/stable-diffusion/ldm/modules/attention.py", line 254, in forward
x = self.norm(x)
File "/Users/redstar16/.asdf/installs/python/miniforge3-4.10.1-5/envs/ldm/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1190, in _call_impl
return forward_call(*input, **kwargs)
File "/Users/redstar16/.asdf/installs/python/miniforge3-4.10.1-5/envs/ldm/lib/python3.10/site-packages/torch/nn/modules/normalization.py", line 273, in forward
return F.group_norm(
File "/Users/redstar16/.asdf/installs/python/miniforge3-4.10.1-5/envs/ldm/lib/python3.10/site-packages/torch/nn/functional.py", line 2524, in group_norm
return torch.group_norm(input, num_groups, weight, bias, eps, torch.backends.cudnn.enabled)
RuntimeError: expected scalar type BFloat16 but found Float
Check https://github.com/junukwon7/stable-diffusion out.
I've combined several suggestions and troubleshooting methods.
git clone https://github.com/einanao/stable-diffusion.git
cd stable-diffusion
git checkout apple-silicon
mkdir -p models/ldm/stable-diffusion-v1/
ln -s /path/to/ckpt/sd-v1-1.ckpt models/ldm/stable-diffusion-v1/model.ckpt
conda create -n ldm python=3.8
conda activate ldm
conda install pytorch torchvision torchaudio -c pytorch-nightly
pip install kornia albumentations opencv-python pudb imageio imageio-ffmpeg pytorch-lightning omegaconf test-tube streamlit einops torch-fidelity transformers
pip install -e git+https://github.com/CompVis/taming-transformers.git@master#egg=taming-transformers
pip install -e git+https://github.com/openai/CLIP.git@main#egg=clip
pip install -e .
git clone https://github.com/magnusviri/stable-diffusion
cd stable-diffusion
git checkout apple-silicon-mps-support
mkdir -p models/ldm/stable-diffusion-v1/
ln -s /path/to/ckpt/sd-v1-1.ckpt models/ldm/stable-diffusion-v1/model.ckpt
conda env create -f environment-mac.yaml
conda activate ldm
Then, as @einanao suggested,
Append .contiguous()
at ldm/models/diffusion/plms.py#L27
So it would look like
- attr = attr.to(torch.float32).to(torch.device(self.device_available))
+ attr = attr.to(torch.float32).to(torch.device(self.device_available)).contiguous()
Append new line x = x.contiguous()
after ldm/modules/attention.py#L211
So it would look like
def _forward(self, x, context=None):
+ x = x.contiguous()
x = self.attn1(self.norm1(x)) + x
Or, you can just modify the torch library as @magnusviri suggested in his repo.
python scripts/txt2img.py --prompt "a photograph of an astronaut riding a horse" --plms --n_samples 1 --n_rows 1 --n_iter 1
Thanks all for finding ways to make standard-diffusion functional in Apple Silicon macs.
ERROR: Could not build wheels for tokenizers, which is required to install pyproject.toml-based projects
Install the Rust Compiler and the issue shall be resolved. Rust is required in order to build wheel for
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
Just remove the watermark generating part in txt2img.py
[MPSNDArray / MPSTemporaryNDArray initWithDevice:descriptor:] Error: product of dimension sizes > 2**31'
Reduce width and height in order to lower memory consumption
RuntimeError: view size is not compatible with input tensor's size and stride (at least one dimension spans across two contiguous subspaces). Use .reshape(...) instead.
Add .contiguous()
as mentioned above
@saibakatar Seems like my system is identical with yours. (M1 max, 64g) Hope you could find a solution.
@akaboshinit As far as I know, macOS torch doesn't support BFloat16. Would you try installing the environments again?
If it still doesn't work, try both einanao's and magnusviri's.
Error I encountered on a 14" M1 Max 64GB with https://github.com/magnusviri/stable-diffusion/tree/apple-silicon-mps-support
error: can't find Rust compiler
If you are using an outdated pip version, it is possible a prebuilt wheel is available for this package but pip is not able to install from it. Installing from the wheel would avoid the need for a Rust compiler.
To update pip, run:
pip install --upgrade pip
and then retry package installation.
If you did intend to build this package from source, try installing a Rust compiler from your system package manager and ensure it is on the PATH during installation. Alternatively, rustup (available at https://rustup.rs) is the recommended way to download and update the Rust compiler toolchain.
[end of output]
note: This error originates from a subprocess, and is likely not a problem with pip.
ERROR: Failed building wheel for tokenizers
ERROR: Could not build wheels for tokenizers, which is required to install pyproject.toml-based projects
Solution to above and other issues to get it working:
brew install rust
; not sure which of 1,2,3 actually fixed the tokenizers problem return torch.layer_norm(input.contiguous(), normalized_shape, weight, bias, eps, torch.backends.cudnn.enabled)
where previously the first return torch.layer_norm()
parameter was input
python scripts/txt2img.py --prompt "a photograph of an astronaut riding a horse" --ckpt 'sd-v1-4.ckpt' --H 512 --W 512 --seed 42 --n_iter 1 --ddim_steps 50
New environment-mac.yaml
name: ldm
channels:
- apple
- conda-forge
- pytorch-nightly
- defaults
dependencies:
- python=3.10.4
- pip=22.2.2
- pytorch=1.13.0.dev20220824
- torchvision
- numpy=1.23.1
- pip:
- albumentations==0.4.6
- diffusers
- opencv-python==4.6.0.66
- pudb==2019.2
- imageio==2.9.0
- imageio-ffmpeg==0.4.2
- pytorch-lightning==1.4.2
- omegaconf==2.1.1
- test-tube>=0.7.5
- streamlit>=0.73.1
- einops==0.3.0
- torch-fidelity==0.3.0
- transformers>=4.9
- torchmetrics==0.6.0
- kornia==0.6
- imwatermark==0.0.2
- invisible-watermark==0.1.5
- -e git+https://github.com/CompVis/taming-transformers.git@master#egg=taming-transformers
- -e git+https://github.com/openai/CLIP.git@main#egg=clip
- -e .
@saibakatar https://github.com/CompVis/stable-diffusion/issues/25#issuecomment-1225849407
stable-diffusion %
That prompt doesn't look like you are inside a conda environment. It doesn't say (base)
nor (ldm)
. Did you run conda activate ldm
? (or something else than 'ldm' if you named it differently, conda env create -f environment-mac.yaml -n something_else
)
That environment you created is set up to run torch with stable diffusion on the M1 GPU.
@Birch-san @marcfon @junukwon7 @akaboshinit Doh! I forgot to include the invisible-watermark requirement in the environment-mac.yaml file. I've fixed it now. I don't know if imwatermark is required. Sorry I didn't test it yesterday. Except I still haven't tested it so I must not be too sorry... If you get around to reinstalling let me know if it works.
Following the instructions on an M1 Max MBP running macOS 12.5.1 gave an error about duplicate OpenMP versions:
OMP: Error #15: Initializing libiomp5.dylib, but found libomp.dylib already initialized.
OMP: Hint This means that multiple copies of the OpenMP runtime have been linked into the program. That is dangerous, since it can degrade performance or cause incorrect results. The best thing to do is to ensure that only a single OpenMP runtime is linked into the process, e.g. by avoiding static linking of the OpenMP runtime in any library. As an unsafe, unsupported, undocumented workaround you can set the environment variable KMP_DUPLICATE_LIB_OK=TRUE to allow the program to continue to execute, but that may cause crashes or silently produce incorrect results. For more information, please see http://www.intel.com/software/products/support/.
Following the instructions on an M1 Max MBP running macOS 12.5.1 gave an error about duplicate OpenMP versions:
OMP: Error #15: Initializing libiomp5.dylib, but found libomp.dylib already initialized. OMP: Hint This means that multiple copies of the OpenMP runtime have been linked into the program. That is dangerous, since it can degrade performance or cause incorrect results. The best thing to do is to ensure that only a single OpenMP runtime is linked into the process, e.g. by avoiding static linking of the OpenMP runtime in any library. As an unsafe, unsupported, undocumented workaround you can set the environment variable KMP_DUPLICATE_LIB_OK=TRUE to allow the program to continue to execute, but that may cause crashes or silently produce incorrect results. For more information, please see http://www.intel.com/software/products/support/.
I fixed this issue by switching to the Apple Silicon version of miniforge instead of Anaconda and using this as my environment-mac.yaml
index 22fd28b..aaf63e9 100644
--- a/environment-mac.yaml
+++ b/environment-mac.yaml
@@ -1,11 +1,12 @@
name: ldm
channels:
+ - conda-forge
- pytorch-nightly
- defaults
dependencies:
- python=3.10.4
- pip=22.1.2
- - pytorch
+ - pytorch=1.13.0.dev20220824
- torchvision
- numpy=1.23.1
- pip:
@magnusviri
Thanks for the reply!
I'm afraid to tell you that the installation still doesn't work. Installing invisible-watermark
triggers error since one of the dependencies, onnx
has some issues while building wheel. (because on python 3.10, mac or cmake? still not clear)
The solution is: just ridding the watermark codes out. As ComVis added invisible-watermark just a two days ago, rolling the code back would have absolutely no problem.
Also, while reading through your description, I thought editing the torch library seemed somehow dangerous. As @einanao suggested, just adding .contiguous()
to the model files would also resolve the issue.
Thanks again for your great work!
I fixed this issue by switching to the Apple Silicon version of miniforge instead of Anaconda and using this as my environment-mac.yaml
This also worked for me, but: only after also switching to the Apple Silicon version of miniforge. Just making those changes to the YAML file resulted in these MKL errors (posting in case someone else runs into this):
Intel MKL FATAL ERROR: This system does not meet the minimum requirements for use of the Intel(R) Math Kernel Library.
The processor must support the Intel(R) Supplemental Streaming SIMD Extensions 3 (Intel(R) SSSE3) instructions.
The processor must support the Intel(R) Streaming SIMD Extensions 4.2 (Intel(R) SSE4.2) instructions.
The processor must support the Intel(R) Advanced Vector Extensions (Intel(R) AVX) instructions.
I followed these instructions to install miniforge.
I just check out last 24 hours solutions and it looks like there is a must to switch to miniforge, no way to make it work by anaconda. On anaconda I reproduce all listed errors.
I just check out last 24 hours solutions and it looks like there is a must to switch to miniforge, no way to make it work by anaconda. On anaconda I reproduce all listed errors.
Yeah - I'm still hitting the float64 issue running your sample code in that fork....
But I did see float32 mentioned in the thread. Maybe that will help?
https://github.com/CompVis/stable-diffusion/issues/25#issuecomment-1221416526
Edit: for reference, we came here from this thread: https://github.com/pytorch/pytorch/issues/77764#issuecomment-1227121050
I have it working with Anaconda. I don't know what I've done different. When I have time (it takes an hour just to read all the comments!) I'll try to figure something out. I'm trying to make my repo usable for all Mac users (until it hopefully gets merged).
Also, I think I'm having this issue: https://github.com/CompVis/stable-diffusion/issues/69
My images are black though. I also get crashes.
/Users/james/.conda/envs/ldm/lib/python3.10/multiprocessing/resource_tracker.py:224: UserWarning: resource_tracker: There appear to be 1 leaked semaphore objects to clean up at shutdown
I'm running this on 2020 MacBook Air M1 w/ 16GB RAM and 8-core GPU, and 2020 Mac Mini M1 w/ 16GB RAM and 8-core GPU.
I have it working with Anaconda. I don't know what I've done different. When I have time (it takes an hour just to read all the comments!) I'll try to figure something out. I'm trying to make my repo usable for all Mac users (until it hopefully gets merged).
I can second that it does work with Anaconda on a MacBook M1 pro.
My two cents is that I followed @cannin steps with two additions:
1- I had to update macOS Monterrey to the last version, since the pytorch MDS implementation only works in 12.3+. I speculate that this solved the RuntimeError: expected scalar type BFloat16 but found Float
error, since I think in one of the repos it was silently defaulting to CPU and ending up in this error.
2- Also, I installed onnx from Anaconda before installing the pip dependencies, which I think also solved some dependency errors.
Maybe it will help anyone: Got it running on 16 m1 Max 32gb (32 cores) Takes around 30 seconds to generate an image (first cosmonaut prompt gave me NSFW warning:) GPU spikes to 95 percent, no fans. sd-v1-4.ckpt model file. Einanoa version gave me Cuda errors, Magnusviri worked. In Magnusviri version there was pip install error of onnx package which i fixed via:
brew install Cmake
brew install protobuf
Thanks everyone for all of your work this is pretty amazing!
In my computer (MB Pro M1 with 8Gb RAM) the process took around 25 minutes to generate an image. What's wrong? Am I using CPU instead of GPU? Thanks!
Does anyone have img2img working using an M1/M2?
I got txt2img to work on both CPU and GPU thanks to your repos (I created a mini guide here: https://www.reddit.com/r/StableDiffusion/comments/wx0tkn/stablediffusion_runs_on_m1_chips/).
img2img gives me Error: product of dimension sizes > 2**31'` with 64 GB of RAM.
Did anyone get it to work?
Does anyone have img2img working using an M1/M2?
I got txt2img to work on both CPU and GPU thanks to your repos (I created a mini guide here: https://www.reddit.com/r/StableDiffusion/comments/wx0tkn/stablediffusion_runs_on_m1_chips/).
img2img gives me Error: product of dimension sizes > 2**31'` with 64 GB of RAM.
Did anyone get it to work?
I got this exact error on img2img as well.
Does anyone have img2img working using an M1/M2? I got txt2img to work on both CPU and GPU thanks to your repos (I created a mini guide here: https://www.reddit.com/r/StableDiffusion/comments/wx0tkn/stablediffusion_runs_on_m1_chips/). img2img gives me Error: product of dimension sizes > 2**31'` with 64 GB of RAM. Did anyone get it to work?
I got this exact error on img2img as well.
I got it to work with 256x256 image. Thanks @junukwon7 for the suggestion. Still doesn't work with 512x512 though.
I have a very weird thing going and I am a bit worried about my machine, I noticed there is some sound (like a hdd sound in some sense) going from my laptop when the txtToimg progress bar is going, The sound is in sync with progress bar going in terminal, starts with progress bar and ends when it finish, anyone with 16 m1 max or other machines can relate?:) I know this sounds weird but works everytime.
Hi,
I’ve heard it is possible to run Stable-Diffusion on Mac Silicon (albeit slowly), would be good to include basic setup and instructions to do this.
Thanks, Chris