Open crsrusl opened 2 years ago
I heard that pytorch updated to include apple MDS in the latest nightly release as well. Will this improve performance on M1 devices by utilizing Metal?
With Homebrew
brew install python3@3.10
pip3 install torch torchvision
pip3 install setuptools_rust
pip3 install -U git+https://github.com/huggingface/diffusers.git
pip3 install transformers scipy ftfy
Then start python3
and follow the instructions for using diffusers.
StableDiffusion is CPU-only on M1 Macs because not all the pytorch ops are implemented for Metal. Generating one image with 50 steps takes 4-5 minutes.
Hi @mja,
thanks for these steps. I can get as far as the last one but then installing transformers fails with this error. (the install of setuptools_rust was successful )
running build_ext
running build_rust
error: can't find Rust compiler
If you are using an outdated pip version, it is possible a prebuilt wheel is available for this package but pip is not able to install from it. Installing from the wheel would avoid the need for a Rust compiler.
To update pip, run:
pip install --upgrade pip
and then retry package installation.
If you did intend to build this package from source, try installing a Rust compiler from your system package manager and ensure it is on the PATH during installation. Alternatively, rustup (available at https://rustup.rs) is the recommended way to download and update the Rust compiler toolchain.
[end of output]
note: This error originates from a subprocess, and is likely not a problem with pip.
ERROR: Failed building wheel for tokenizers
Failed to build tokenizers
ERROR: Could not build wheels for tokenizers, which is required to install pyproject.toml-based projects
for context the first step failed to install python3.10 with brew so I did it with Conda instead. Not sure if having a full anaconda env installed is the problem
Just tried the pytorch nighly build with mps support and have some good news.
On my cpu (M1 Max) it runs very slow, almost 9 minutes per image, but with mps enabled it's ~18x faster: less than 30 seconds per image🤩
Incredible! Would you mind sharing your exact setup so I can duplicate on my end?
Unfortunately I got it working by many hours of trial and error, and in the end I don't know what worked. I'm not even a programmer, I'm just really good at googling stuff.
Basically my process was:
torch.device("cpu"/"cuda")
to torch.device("mps")
attr = attr.to(torch.device("mps"), torch.float32)
return torch.layer_norm(input.contiguous(), ...
export PYTORCH_ENABLE_MPS_FALLBACK=1
I'm sorry that I can't be more helpful than this.
Thanks. What are you currently using for checkpoints? Are you using research weights or are you using another model for now?
I don't have access to the model so I haven't tested it, but based off of what @filipux said, I created this pull request to add mps support. If you can't wait for them to merge it you can clone my fork and switch to the apple-silicon-mps-support branch and try it out. Just follow the normal instructions but instead of running conda env create -f environment.yaml
, run conda env create -f environment-mac.yaml
. I think the only other requirement is that you have to have macOS 12.3 or greater.
I couldn't quite get your fork to work @magnusviri, but based on most of @filipux's suggestions, I was able to install and generate samples on my M2 machine using https://github.com/einanao/stable-diffusion/tree/apple-silicon
Edit: If you're looking at this comment now, you probably shouldn't follow this. Apparently a lot can change in 2 weeks!
I got it to work fully natively without the CPU fallback, sort of. The way I did things is ugly since I prioritized making it work. I can't comment on speeds but my assumption is that using only the native MPS backend is faster? I used the mps_master branch from kulinseth/pytorch as a base, since it contains an implementation for `aten::index.Tensor_out` that appears to work from what I can tell: https://github.com/Raymonf/pytorch/tree/mps_master If you want to use my ugly changes, you'll have to compile PyTorch from scratch as I couldn't get the CPU fallback to work: ```bash # clone the modified mps_master branch git clone --recursive -b mps_master https://github.com/Raymonf/pytorch.git pytorch_mps && cd pytorch_mps # dependencies to build (including for distributed) # slightly modified from the docs conda install astunparse numpy ninja pyyaml setuptools cmake cffi typing_extensions future six requests dataclasses pkg-config libuv # build pytorch with explicit USE_DISTRIBUTED=1 USE_DISTRIBUTED=1 MACOSX_DEPLOYMENT_TARGET=12.4 CC=clang CXX=clang++ python setup.py install ``` I based my version of the Stable Diffusion code on the code from PR #47's branch, you can find my fork here: https://github.com/Raymonf/stable-diffusion/tree/apple-silicon-mps-support Just your typical `pip install -e .` should work for this, there's nothing too special going on here, it's just not what I'd call upstream-quality code by any means. I have only tested txt2img, but I did try to modify knn2img and img2img too. Edit: It definitely takes more than 20 seconds per image at the default settings with either sampler, not sure if I did something wrong. Might be hitting https://github.com/pytorch/pytorch/issues/77799 :(
@magnusviri: You are free to take anything from my branch for yourself if it's helpful at all, thanks for the PR 😃
@Raymonf: I merged your changes with mine and so they are in the pull request now. It caught everything that I missed and it almost identical to the changes that @einanao made as well. The only difference I could see was in ldm/models/diffusion/plms.py
einanao:
def register_buffer(self, name, attr):
if type(attr) == torch.Tensor:
if attr.device != torch.device("cuda"):
attr = attr.type(torch.float32).to(torch.device("mps")).contiguous()
Raymonf:
def register_buffer(self, name, attr):
if type(attr) == torch.Tensor:
if attr.device != torch.device(self.device_available):
attr = attr.to(torch.float32).to(torch.device(self.device_available))
I don't know what the code differences are, except that I read that adding .contiguous() fixes bugs when falling back to the cpu.
Pretty sure my version is redundant (I also added a downstream call to .contiguous(), but forgot to remove this one)
On Sun, Aug 21, 2022 at 1:49 AM, James Reynolds @.***> wrote:
@.(https://github.com/Raymonf): I merged your changes with mine and so they are in the pull request now. It caught everything that I missed and it almost identical to the changes that @.(https://github.com/einanao) made as well. The only difference I could see was in ldm/models/diffusion/plms.py
einanao:
def register_buffer(self, name, attr): if type(attr) == torch.Tensor: if attr.device != torch.device("cuda"): attr = attr.type(torch.float32).to(torch.device("mps")).contiguous()
Raymonf:
def register_buffer(self, name, attr): if type(attr) == torch.Tensor: if attr.device != torch.device(self.device_available): attr = attr.to(torch.float32).to(torch.device(self.device_available))
I don't know what the code differences are, except that I read that adding .contiguous() fixes bugs when falling back to the cpu.
— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.Message ID: @.***>
@einanao Maybe not! How long does yours take to run the default seed and prompt with full precision? GNU time reports 4.52.5 minutes with the fans at 100% on a 16 inch M1 Max, which is way longer than 20 seconds. I'm curious if you using the CPU fallback for some parts makes it faster at all.
It takes me 1.5 minutes to generate 1 sample on a 13 inch M2 2022
I'm getting this error when trying to run with the laion400 data set:
return torch.layer_norm(input, normalized_shape, weight, bias, eps, torch.backends.cudnn.enabled) RuntimeError: view size is not compatible with input tensor's size and stride (at least one dimension spans across two contiguous subspaces). Use .reshape(...) instead.
Is this an issue with the torch functional.py script?
Yes, see @filipux's earlier comment:
in layer_norm() in functional.py (part of pytorch I guess), change to return torch.layer_norm(input.contiguous(), ...
@einanao thank you. One step closer, but now I'm getting this:
return torch.layer_norm(input, normalized_shape, weight, bias, eps, torch.backends.mps.enabled) AttributeError: module 'torch.backends.mps' has no attribute 'enabled'
Here is my function:
def layer_norm( input: Tensor, normalized_shape: List[int], weight: Optional[Tensor] = None, bias: Optional[Tensor] = None, eps: float = 1e-5, ) -> Tensor:
if has_torch_function_variadic(input, weight, bias):
return handle_torch_function(
layer_norm, (input.contiguous(), weight, bias), input, normalized_shape, weight=weight, bias=bias, eps=eps
)
return torch.layer_norm(input.contiguous(), normalized_shape, weight, bias, eps, torch.backends.cudnn.enabled)`
It takes me 1.5 minutes to generate 1 sample on a 13 inch M2 2022
For benchmarking purposes - I'm at ~150s (2.5 minutes) on each iteration past the first, which was over 500s after setting up with the steps in these comments.
14" 2021 Macbook Pro with base specs. (M1 Pro chip)
This worked for me. I'm seeing about 30 seconds per image on a 14" M1 Max MacBook Pro (32 GPU core).
This worked for me. I'm seeing about 30 seconds per image on a 14" M1 Max MacBook Pro (32 GPU core).
What steps did you follow? I tried three apple forks but they all are taking 1h to generate using the sample command (python scripts/txt2img.py --prompt "a photograph of an astronaut riding a horse" --plms) I'm using pytorch nightly btw.
@henrique-galimberti I followed these steps:
mps support for aten::index.Tensor_out is now in pytorch nightly according to Denis
Looks like there's a ticket for the reshape error at https://github.com/pytorch/pytorch/issues/80800
mps support for aten::index.Tensor_out is now in pytorch nightly according to Denis
Is that the pytorch nightly branch? That particular branch is 1068 commits ahead and 28606 commits behind the master. The last commit was 15 hours ago. But master has commits kind of non-stop for the last 9 hours.
@henrique-galimberti I followed these steps:
- Install PyTorch nightly
- Used this branch referenced above from @magnusviri
- Modify functional.py as noted above here to resolve view size not compatible issue
Where can I find the functional.py file ?
Where can I find the functional.py file ?
import torch torch.__file__
For me the path is below. Your path will be different.
'/Users/lab/.local/share/virtualenvs/lab-2cY4ojCF/lib/python3.10/site-packages/torch/__init__.py'
Then replace
__init__.py
withnn/functional.py
I change conda env to use rosetta and it is faster than before, but still waaaay too slow:
Is that the pytorch nightly branch? That particular branch is 1068 commits ahead and 28606 commits behind the master.
It was merged 5 days ago so it should be in the regular PyTorch nightly that you can get directly from the PyTorch site.
@henrique-galimberti I followed these steps:
* Install PyTorch nightly * Used [this branch](https://github.com/CompVis/stable-diffusion/pull/47) referenced above from @magnusviri * Modify functional.py as noted above [here](https://github.com/CompVis/stable-diffusion/issues/25#issuecomment-1221667017) to resolve view size not compatible issue
I also followed these steps and confirmed MPS was being used (printed the return value of get_device()
) but it's taking about 31.74s/it
, which seems very slow.
After waiting a few minutes the average speed increased to 12.74s/it
I closed all other apps, mem pressure graph looks like this:
When I run this on Monterey 12.5.1, Python 3.8 consumes 18.70 GB at peak
@cgodley it looks like 16GB of RAM isn’t enough to run this well. It’s slow for you because of all of the swapping, which is quite unhealthy for your SSD. If it takes 12s/iteration, I’d try CPU as I was getting ~11.5s/it with CPU inference.
@Raymonf Yes you're correct, it's using swap so it's not going to run smoothly for me.
Just realized I was using the default settings for txt2img.py
, which I think does batches of 3 images at a time. I'll try doing single images and see if that helps. Otherwise I'll probably have to use Colab or AWS
That worked.
Just tried python scripts/txt2img.py --n_samples 1 --n_rows 1 --plms
and I'm getting 1.24it/s
now. My swap usage is about 1GB but the mem pressure graph is all green and I don't see much disk activity. Presumably some other apps got swapped to disk at the beginning, then txt2img
ran smoothly after that.
Takes abut 70 seconds to make 1 image.
Nice, I am using the 2020 13” pro w/ 16gb RAM to generate single images (with a single sample) and I’m seeing around 2.5s/iter. Not great, but relatively reasonable!
Have anyone managed to run img2img
? I just did quick test with a 512x512 input image but there is an error message:
AppleInternal/Library/BuildRoots/20d6c351-ee94-11ec-bcaf-7247572f23b4/Library/Caches/com.apple.xbs/Sources/MetalPerformanceShaders/MPSCore/Types/MPSNDArray.mm:705: failed assertion '[MPSTemporaryNDArray initWithDevice:descriptor:] Error: product of dimension sizes > 2**31'
This error also shows when I try to change the width or height (--W/--H) in regular txt2img.
an even simpler way to change the torch functional.py
file is to just run this which does it automatically:
python -c "
import torch
import os
filename = torch.__file__[:-11] + 'nn/functional.py'
os.system(f\"sed -i '' 's/return torch.layer_norm(input/return torch.layer_norm(input.contiguous()/g' '{filename}'\")
"
that said, i can't get the model to run. i'm getting an error from line 39 of txt2img.py (model.cuda()
):
Some weights of the model checkpoint at openai/clip-vit-large-patch14 were not used when initializing CLIPTextModel: ['vision_model.encoder.layers.22.mlp.fc1.weight', 'vision_model.encoder.layers.13.self_attn.k_proj.weight', 'vision_model.encoder.layers.3.self_attn.v_proj.weight', 'vision_model.encoder.layers.14.mlp.fc1.bias', 'vision_model.encoder.layers.5.self_attn.v_proj.weight', 'vision_model.encoder.layers.14.layer_norm1.weight', 'vision_model.encoder.layers.21.self_attn.k_proj.bias', 'vision_model.encoder.layers.17.layer_norm2.bias', 'vision_model.encoder.layers.4.layer_norm1.weight', 'vision_model.encoder.layers.13.self_attn.v_proj.bias', 'vision_model.encoder.layers.2.self_attn.k_proj.bias', 'vision_model.encoder.layers.14.self_attn.out_proj.weight', 'vision_model.encoder.layers.20.self_attn.v_proj.weight', 'vision_model.embeddings.position_embedding.weight', 'vision_model.encoder.layers.8.mlp.fc1.bias', 'vision_model.encoder.layers.18.self_attn.k_proj.bias', 'vision_model.pre_layrnorm.bias', 'vision_model.encoder.layers.10.self_attn.k_proj.weight', 'vision_model.encoder.layers.1.layer_norm2.bias', 'vision_model.encoder.layers.7.mlp.fc1.bias', 'vision_model.encoder.layers.23.mlp.fc2.weight', 'vision_model.encoder.layers.14.self_attn.q_proj.bias', 'vision_model.encoder.layers.0.layer_norm1.bias', 'vision_model.encoder.layers.16.self_attn.v_proj.bias', 'vision_model.encoder.layers.8.self_attn.v_proj.weight', 'vision_model.encoder.layers.18.mlp.fc2.weight', 'vision_model.encoder.layers.17.self_attn.v_proj.bias', 'vision_model.encoder.layers.2.self_attn.k_proj.weight', 'vision_model.encoder.layers.23.self_attn.v_proj.bias', 'vision_model.encoder.layers.19.mlp.fc2.bias', 'vision_model.encoder.layers.18.self_attn.v_proj.weight', 'vision_model.encoder.layers.4.mlp.fc2.weight', 'vision_model.encoder.layers.21.layer_norm2.bias', 'vision_model.encoder.layers.22.layer_norm2.bias', 'vision_model.encoder.layers.3.mlp.fc2.weight', 'vision_model.encoder.layers.19.self_attn.q_proj.bias', 'vision_model.encoder.layers.17.mlp.fc1.weight', 'vision_model.encoder.layers.7.self_attn.out_proj.bias', 'vision_model.encoder.layers.10.layer_norm2.bias', 'vision_model.encoder.layers.19.mlp.fc2.weight', 'vision_model.encoder.layers.6.layer_norm1.weight', 'vision_model.encoder.layers.3.mlp.fc1.weight', 'vision_model.encoder.layers.7.self_attn.v_proj.weight', 'vision_model.encoder.layers.18.mlp.fc1.bias', 'vision_model.encoder.layers.4.layer_norm1.bias', 'vision_model.encoder.layers.8.layer_norm2.bias', 'vision_model.encoder.layers.21.self_attn.k_proj.weight', 'text_projection.weight', 'vision_model.encoder.layers.16.mlp.fc2.bias', 'vision_model.encoder.layers.1.self_attn.out_proj.weight', 'vision_model.encoder.layers.1.layer_norm1.bias', 'vision_model.encoder.layers.13.mlp.fc1.weight', 'vision_model.encoder.layers.2.mlp.fc1.weight', 'vision_model.encoder.layers.6.mlp.fc1.bias', 'vision_model.encoder.layers.15.self_attn.out_proj.bias', 'vision_model.encoder.layers.1.layer_norm1.weight', 'vision_model.encoder.layers.10.self_attn.v_proj.weight', 'vision_model.encoder.layers.0.self_attn.q_proj.weight', 'vision_model.encoder.layers.21.self_attn.out_proj.weight', 'vision_model.encoder.layers.14.layer_norm2.bias', 'vision_model.encoder.layers.15.layer_norm2.bias', 'vision_model.encoder.layers.19.self_attn.v_proj.bias', 'vision_model.encoder.layers.17.self_attn.q_proj.bias', 'vision_model.encoder.layers.10.layer_norm2.weight', 'vision_model.encoder.layers.21.layer_norm1.bias', 'vision_model.encoder.layers.1.mlp.fc2.weight', 'vision_model.encoder.layers.12.mlp.fc1.bias', 'vision_model.encoder.layers.14.self_attn.q_proj.weight', 'vision_model.encoder.layers.23.mlp.fc2.bias', 'vision_model.encoder.layers.5.layer_norm2.weight', 'vision_model.encoder.layers.20.mlp.fc2.weight', 'vision_model.encoder.layers.1.self_attn.q_proj.bias', 'vision_model.encoder.layers.5.self_attn.q_proj.weight', 'vision_model.encoder.layers.5.self_attn.k_proj.bias', 'vision_model.encoder.layers.8.layer_norm1.bias', 'vision_model.encoder.layers.4.self_attn.out_proj.weight', 'vision_model.encoder.layers.7.mlp.fc2.bias', 'vision_model.encoder.layers.5.self_attn.out_proj.bias', 'vision_model.encoder.layers.23.layer_norm1.bias', 'vision_model.encoder.layers.10.self_attn.q_proj.weight', 'vision_model.encoder.layers.14.self_attn.k_proj.bias', 'vision_model.encoder.layers.7.layer_norm1.bias', 'vision_model.encoder.layers.19.self_attn.k_proj.weight', 'vision_model.encoder.layers.16.layer_norm2.weight', 'vision_model.encoder.layers.17.self_attn.v_proj.weight', 'vision_model.encoder.layers.13.layer_norm1.weight', 'vision_model.encoder.layers.22.self_attn.q_proj.bias', 'vision_model.encoder.layers.13.mlp.fc2.weight', 'vision_model.encoder.layers.12.mlp.fc2.weight', 'vision_model.encoder.layers.16.layer_norm1.weight', 'vision_model.encoder.layers.10.layer_norm1.bias', 'vision_model.encoder.layers.9.self_attn.k_proj.weight', 'vision_model.encoder.layers.2.self_attn.out_proj.weight', 'vision_model.encoder.layers.9.self_attn.v_proj.weight', 'vision_model.encoder.layers.13.self_attn.v_proj.weight', 'vision_model.encoder.layers.21.mlp.fc2.bias', 'vision_model.encoder.layers.2.mlp.fc1.bias', 'vision_model.encoder.layers.20.mlp.fc2.bias', 'vision_model.encoder.layers.19.self_attn.k_proj.bias', 'vision_model.encoder.layers.4.mlp.fc2.bias', 'vision_model.encoder.layers.6.layer_norm2.bias', 'vision_model.encoder.layers.21.mlp.fc1.bias', 'vision_model.encoder.layers.11.self_attn.q_proj.bias', 'vision_model.encoder.layers.20.self_attn.out_proj.bias', 'vision_model.encoder.layers.17.mlp.fc2.bias', 'vision_model.encoder.layers.16.self_attn.k_proj.weight', 'vision_model.encoder.layers.7.layer_norm2.weight', 'vision_model.encoder.layers.8.mlp.fc2.bias', 'vision_model.encoder.layers.5.mlp.fc1.bias', 'vision_model.encoder.layers.15.self_attn.q_proj.bias', 'vision_model.encoder.layers.6.self_attn.q_proj.weight', 'vision_model.encoder.layers.22.self_attn.q_proj.weight', 'vision_model.encoder.layers.12.layer_norm1.weight', 'vision_model.encoder.layers.19.layer_norm1.bias', 'vision_model.encoder.layers.18.layer_norm2.bias', 'vision_model.encoder.layers.14.self_attn.v_proj.bias', 'vision_model.encoder.layers.23.self_attn.out_proj.weight', 'vision_model.encoder.layers.11.self_attn.v_proj.bias', 'vision_model.encoder.layers.10.self_attn.out_proj.bias', 'vision_model.encoder.layers.1.mlp.fc2.bias', 'vision_model.encoder.layers.3.mlp.fc2.bias', 'vision_model.encoder.layers.6.self_attn.q_proj.bias', 'logit_scale', 'vision_model.encoder.layers.5.layer_norm1.weight', 'vision_model.encoder.layers.4.mlp.fc1.weight', 'vision_model.encoder.layers.6.self_attn.k_proj.weight', 'vision_model.encoder.layers.19.mlp.fc1.weight', 'vision_model.encoder.layers.21.layer_norm1.weight', 'vision_model.encoder.layers.2.self_attn.v_proj.weight', 'visual_projection.weight', 'vision_model.encoder.layers.18.self_attn.q_proj.bias', 'vision_model.encoder.layers.15.self_attn.q_proj.weight', 'vision_model.encoder.layers.21.self_attn.q_proj.bias', 'vision_model.encoder.layers.2.self_attn.q_proj.bias', 'vision_model.encoder.layers.4.self_attn.v_proj.bias', 'vision_model.encoder.layers.19.mlp.fc1.bias', 'vision_model.encoder.layers.3.self_attn.v_proj.bias', 'vision_model.encoder.layers.2.self_attn.out_proj.bias', 'vision_model.encoder.layers.15.self_attn.v_proj.bias', 'vision_model.encoder.layers.4.self_attn.v_proj.weight', 'vision_model.encoder.layers.22.self_attn.k_proj.weight', 'vision_model.encoder.layers.12.self_attn.q_proj.bias', 'vision_model.encoder.layers.15.layer_norm2.weight', 'vision_model.encoder.layers.23.layer_norm2.weight', 'vision_model.encoder.layers.16.mlp.fc2.weight', 'vision_model.encoder.layers.13.layer_norm2.bias', 'vision_model.encoder.layers.9.self_attn.q_proj.weight', 'vision_model.encoder.layers.9.mlp.fc2.bias', 'vision_model.encoder.layers.21.self_attn.q_proj.weight', 'vision_model.encoder.layers.5.layer_norm1.bias', 'vision_model.encoder.layers.20.layer_norm2.bias', 'vision_model.encoder.layers.11.layer_norm2.bias', 'vision_model.encoder.layers.6.mlp.fc2.bias', 'vision_model.encoder.layers.23.mlp.fc1.bias', 'vision_model.encoder.layers.13.layer_norm2.weight', 'vision_model.encoder.layers.10.self_attn.q_proj.bias', 'vision_model.encoder.layers.18.mlp.fc2.bias', 'vision_model.encoder.layers.20.layer_norm1.weight', 'vision_model.encoder.layers.0.self_attn.out_proj.weight', 'vision_model.encoder.layers.3.self_attn.k_proj.weight', 'vision_model.encoder.layers.5.layer_norm2.bias', 'vision_model.encoder.layers.12.layer_norm2.bias', 'vision_model.encoder.layers.12.self_attn.v_proj.bias', 'vision_model.encoder.layers.15.self_attn.k_proj.weight', 'vision_model.encoder.layers.7.self_attn.k_proj.bias', 'vision_model.encoder.layers.10.self_attn.k_proj.bias', 'vision_model.encoder.layers.10.mlp.fc1.weight', 'vision_model.encoder.layers.12.mlp.fc2.bias', 'vision_model.encoder.layers.12.self_attn.k_proj.bias', 'vision_model.encoder.layers.10.mlp.fc1.bias', 'vision_model.encoder.layers.18.layer_norm1.bias', 'vision_model.encoder.layers.1.mlp.fc1.bias', 'vision_model.encoder.layers.17.self_attn.q_proj.weight', 'vision_model.encoder.layers.1.self_attn.out_proj.bias', 'vision_model.encoder.layers.6.mlp.fc2.weight', 'vision_model.encoder.layers.11.self_attn.out_proj.bias', 'vision_model.encoder.layers.0.mlp.fc2.weight', 'vision_model.encoder.layers.3.self_attn.out_proj.bias', 'vision_model.encoder.layers.4.self_attn.out_proj.bias', 'vision_model.encoder.layers.11.layer_norm1.weight', 'vision_model.encoder.layers.6.layer_norm1.bias', 'vision_model.encoder.layers.16.self_attn.out_proj.weight', 'vision_model.encoder.layers.23.self_attn.q_proj.weight', 'vision_model.encoder.layers.18.self_attn.k_proj.weight', 'vision_model.encoder.layers.17.layer_norm2.weight', 'vision_model.encoder.layers.2.layer_norm1.weight', 'vision_model.encoder.layers.15.self_attn.k_proj.bias', 'vision_model.encoder.layers.9.layer_norm1.bias', 'vision_model.encoder.layers.7.mlp.fc1.weight', 'vision_model.encoder.layers.15.mlp.fc1.weight', 'vision_model.encoder.layers.23.mlp.fc1.weight', 'vision_model.encoder.layers.18.layer_norm2.weight', 'vision_model.encoder.layers.1.self_attn.v_proj.bias', 'vision_model.encoder.layers.19.self_attn.v_proj.weight', 'vision_model.encoder.layers.2.layer_norm2.weight', 'vision_model.encoder.layers.15.mlp.fc2.weight', 'vision_model.encoder.layers.22.self_attn.v_proj.weight', 'vision_model.encoder.layers.21.self_attn.out_proj.bias', 'vision_model.encoder.layers.8.layer_norm1.weight', 'vision_model.encoder.layers.3.self_attn.out_proj.weight', 'vision_model.encoder.layers.1.self_attn.k_proj.bias', 'vision_model.encoder.layers.6.self_attn.k_proj.bias', 'vision_model.encoder.layers.9.layer_norm2.bias', 'vision_model.encoder.layers.16.self_attn.k_proj.bias', 'vision_model.encoder.layers.19.layer_norm2.weight', 'vision_model.encoder.layers.22.layer_norm1.weight', 'vision_model.encoder.layers.8.self_attn.k_proj.bias', 'vision_model.encoder.layers.4.layer_norm2.weight', 'vision_model.encoder.layers.13.self_attn.q_proj.weight', 'vision_model.encoder.layers.23.self_attn.k_proj.weight', 'vision_model.encoder.layers.18.self_attn.q_proj.weight', 'vision_model.encoder.layers.11.mlp.fc2.bias', 'vision_model.encoder.layers.0.mlp.fc2.bias', 'vision_model.encoder.layers.3.self_attn.q_proj.bias', 'vision_model.encoder.layers.9.self_attn.k_proj.bias', 'vision_model.encoder.layers.15.layer_norm1.bias', 'vision_model.encoder.layers.6.mlp.fc1.weight', 'vision_model.encoder.layers.0.layer_norm2.bias', 'vision_model.encoder.layers.12.self_attn.v_proj.weight', 'vision_model.encoder.layers.0.self_attn.v_proj.weight', 'vision_model.encoder.layers.0.self_attn.q_proj.bias', 'vision_model.encoder.layers.0.layer_norm1.weight', 'vision_model.encoder.layers.9.mlp.fc1.weight', 'vision_model.encoder.layers.8.self_attn.q_proj.weight', 'vision_model.encoder.layers.14.layer_norm1.bias', 'vision_model.encoder.layers.21.mlp.fc1.weight', 'vision_model.encoder.layers.9.layer_norm1.weight', 'vision_model.encoder.layers.2.self_attn.q_proj.weight', 'vision_model.encoder.layers.16.mlp.fc1.weight', 'vision_model.encoder.layers.2.self_attn.v_proj.bias', 'vision_model.encoder.layers.17.self_attn.out_proj.bias', 'vision_model.encoder.layers.5.self_attn.v_proj.bias', 'vision_model.encoder.layers.8.layer_norm2.weight', 'vision_model.encoder.layers.11.self_attn.q_proj.weight', 'vision_model.encoder.layers.17.self_attn.k_proj.bias', 'vision_model.encoder.layers.2.layer_norm1.bias', 'vision_model.encoder.layers.3.layer_norm1.bias', 'vision_model.encoder.layers.5.mlp.fc2.weight', 'vision_model.encoder.layers.14.mlp.fc2.weight', 'vision_model.encoder.layers.11.layer_norm2.weight', 'vision_model.encoder.layers.9.self_attn.out_proj.bias', 'vision_model.embeddings.patch_embedding.weight', 'vision_model.encoder.layers.16.self_attn.out_proj.bias', 'vision_model.encoder.layers.15.self_attn.out_proj.weight', 'vision_model.encoder.layers.6.self_attn.out_proj.bias', 'vision_model.encoder.layers.5.self_attn.q_proj.bias', 'vision_model.encoder.layers.7.layer_norm1.weight', 'vision_model.encoder.layers.22.mlp.fc2.bias', 'vision_model.encoder.layers.1.self_attn.k_proj.weight', 'vision_model.encoder.layers.8.mlp.fc2.weight', 'vision_model.encoder.layers.13.layer_norm1.bias', 'vision_model.encoder.layers.11.mlp.fc1.weight', 'vision_model.encoder.layers.9.mlp.fc2.weight', 'vision_model.encoder.layers.9.self_attn.v_proj.bias', 'vision_model.encoder.layers.22.mlp.fc2.weight', 'vision_model.encoder.layers.16.mlp.fc1.bias', 'vision_model.encoder.layers.0.mlp.fc1.weight', 'vision_model.encoder.layers.15.mlp.fc1.bias', 'vision_model.encoder.layers.17.layer_norm1.weight', 'vision_model.encoder.layers.20.self_attn.v_proj.bias', 'vision_model.encoder.layers.8.self_attn.q_proj.bias', 'vision_model.encoder.layers.20.self_attn.out_proj.weight', 'vision_model.encoder.layers.17.layer_norm1.bias', 'vision_model.encoder.layers.3.mlp.fc1.bias', 'vision_model.encoder.layers.20.layer_norm1.bias', 'vision_model.encoder.layers.10.mlp.fc2.weight', 'vision_model.encoder.layers.8.self_attn.out_proj.bias', 'vision_model.encoder.layers.4.self_attn.k_proj.weight', 'vision_model.encoder.layers.9.self_attn.q_proj.bias', 'vision_model.encoder.layers.3.self_attn.k_proj.bias', 'vision_model.encoder.layers.16.layer_norm1.bias', 'vision_model.encoder.layers.21.self_attn.v_proj.bias', 'vision_model.encoder.layers.8.self_attn.out_proj.weight', 'vision_model.encoder.layers.3.self_attn.q_proj.weight', 'vision_model.encoder.layers.8.self_attn.v_proj.bias', 'vision_model.embeddings.class_embedding', 'vision_model.encoder.layers.17.mlp.fc1.bias', 'vision_model.encoder.layers.20.mlp.fc1.weight', 'vision_model.encoder.layers.18.mlp.fc1.weight', 'vision_model.encoder.layers.13.mlp.fc1.bias', 'vision_model.encoder.layers.15.layer_norm1.weight', 'vision_model.encoder.layers.14.mlp.fc1.weight', 'vision_model.encoder.layers.23.self_attn.v_proj.weight', 'vision_model.encoder.layers.17.self_attn.out_proj.weight', 'vision_model.encoder.layers.14.mlp.fc2.bias', 'vision_model.encoder.layers.0.self_attn.out_proj.bias', 'vision_model.encoder.layers.16.self_attn.q_proj.weight', 'vision_model.encoder.layers.23.layer_norm2.bias', 'vision_model.encoder.layers.19.self_attn.q_proj.weight', 'vision_model.encoder.layers.13.mlp.fc2.bias', 'vision_model.encoder.layers.0.self_attn.k_proj.weight', 'vision_model.encoder.layers.14.self_attn.out_proj.bias', 'vision_model.encoder.layers.12.layer_norm2.weight', 'vision_model.encoder.layers.22.layer_norm1.bias', 'vision_model.embeddings.position_ids', 'vision_model.encoder.layers.22.self_attn.out_proj.bias', 'vision_model.encoder.layers.6.self_attn.v_proj.weight', 'vision_model.encoder.layers.0.self_attn.v_proj.bias', 'vision_model.encoder.layers.19.layer_norm2.bias', 'vision_model.encoder.layers.18.layer_norm1.weight', 'vision_model.encoder.layers.11.mlp.fc2.weight', 'vision_model.encoder.layers.23.self_attn.q_proj.bias', 'vision_model.encoder.layers.7.self_attn.q_proj.weight', 'vision_model.encoder.layers.22.layer_norm2.weight', 'vision_model.encoder.layers.5.self_attn.out_proj.weight', 'vision_model.encoder.layers.19.layer_norm1.weight', 'vision_model.pre_layrnorm.weight', 'vision_model.encoder.layers.6.self_attn.v_proj.bias', 'vision_model.encoder.layers.11.self_attn.v_proj.weight', 'vision_model.encoder.layers.23.self_attn.out_proj.bias', 'vision_model.encoder.layers.22.mlp.fc1.bias', 'vision_model.encoder.layers.14.self_attn.k_proj.weight', 'vision_model.encoder.layers.3.layer_norm1.weight', 'vision_model.encoder.layers.20.self_attn.q_proj.bias', 'vision_model.encoder.layers.5.mlp.fc2.bias', 'vision_model.encoder.layers.8.mlp.fc1.weight', 'vision_model.encoder.layers.9.layer_norm2.weight', 'vision_model.encoder.layers.11.self_attn.k_proj.bias', 'vision_model.encoder.layers.19.self_attn.out_proj.bias', 'vision_model.encoder.layers.20.self_attn.k_proj.bias', 'vision_model.encoder.layers.12.self_attn.out_proj.bias', 'vision_model.encoder.layers.1.layer_norm2.weight', 'vision_model.encoder.layers.10.layer_norm1.weight', 'vision_model.encoder.layers.6.self_attn.out_proj.weight', 'vision_model.encoder.layers.20.self_attn.q_proj.weight', 'vision_model.encoder.layers.6.layer_norm2.weight', 'vision_model.encoder.layers.14.self_attn.v_proj.weight', 'vision_model.encoder.layers.13.self_attn.k_proj.bias', 'vision_model.encoder.layers.22.self_attn.v_proj.bias', 'vision_model.encoder.layers.9.self_attn.out_proj.weight', 'vision_model.encoder.layers.19.self_attn.out_proj.weight', 'vision_model.encoder.layers.20.mlp.fc1.bias', 'vision_model.encoder.layers.16.self_attn.v_proj.weight', 'vision_model.encoder.layers.21.layer_norm2.weight', 'vision_model.encoder.layers.10.self_attn.out_proj.weight', 'vision_model.encoder.layers.21.mlp.fc2.weight', 'vision_model.encoder.layers.7.self_attn.q_proj.bias', 'vision_model.encoder.layers.12.self_attn.q_proj.weight', 'vision_model.encoder.layers.11.self_attn.out_proj.weight', 'vision_model.encoder.layers.1.self_attn.q_proj.weight', 'vision_model.encoder.layers.11.mlp.fc1.bias', 'vision_model.encoder.layers.7.self_attn.out_proj.weight', 'vision_model.encoder.layers.10.mlp.fc2.bias', 'vision_model.encoder.layers.18.self_attn.out_proj.bias', 'vision_model.encoder.layers.1.self_attn.v_proj.weight', 'vision_model.post_layernorm.weight', 'vision_model.encoder.layers.18.self_attn.out_proj.weight', 'vision_model.post_layernorm.bias', 'vision_model.encoder.layers.7.self_attn.v_proj.bias', 'vision_model.encoder.layers.2.mlp.fc2.bias', 'vision_model.encoder.layers.17.mlp.fc2.weight', 'vision_model.encoder.layers.20.layer_norm2.weight', 'vision_model.encoder.layers.3.layer_norm2.bias', 'vision_model.encoder.layers.7.mlp.fc2.weight', 'vision_model.encoder.layers.18.self_attn.v_proj.bias', 'vision_model.encoder.layers.12.layer_norm1.bias', 'vision_model.encoder.layers.13.self_attn.out_proj.bias', 'vision_model.encoder.layers.21.self_attn.v_proj.weight', 'vision_model.encoder.layers.16.self_attn.q_proj.bias', 'vision_model.encoder.layers.4.self_attn.q_proj.bias', 'vision_model.encoder.layers.0.self_attn.k_proj.bias', 'vision_model.encoder.layers.22.self_attn.k_proj.bias', 'vision_model.encoder.layers.23.self_attn.k_proj.bias', 'vision_model.encoder.layers.8.self_attn.k_proj.weight', 'vision_model.encoder.layers.11.self_attn.k_proj.weight', 'vision_model.encoder.layers.2.layer_norm2.bias', 'vision_model.encoder.layers.5.self_attn.k_proj.weight', 'vision_model.encoder.layers.9.mlp.fc1.bias', 'vision_model.encoder.layers.7.layer_norm2.bias', 'vision_model.encoder.layers.17.self_attn.k_proj.weight', 'vision_model.encoder.layers.12.self_attn.out_proj.weight', 'vision_model.encoder.layers.0.mlp.fc1.bias', 'vision_model.encoder.layers.15.mlp.fc2.bias', 'vision_model.encoder.layers.16.layer_norm2.bias', 'vision_model.encoder.layers.4.self_attn.q_proj.weight', 'vision_model.encoder.layers.12.self_attn.k_proj.weight', 'vision_model.encoder.layers.10.self_attn.v_proj.bias', 'vision_model.encoder.layers.15.self_attn.v_proj.weight', 'vision_model.encoder.layers.2.mlp.fc2.weight', 'vision_model.encoder.layers.13.self_attn.q_proj.bias', 'vision_model.encoder.layers.12.mlp.fc1.weight', 'vision_model.encoder.layers.7.self_attn.k_proj.weight', 'vision_model.encoder.layers.11.layer_norm1.bias', 'vision_model.encoder.layers.4.mlp.fc1.bias', 'vision_model.encoder.layers.1.mlp.fc1.weight', 'vision_model.encoder.layers.5.mlp.fc1.weight', 'vision_model.encoder.layers.4.layer_norm2.bias', 'vision_model.encoder.layers.23.layer_norm1.weight', 'vision_model.encoder.layers.22.self_attn.out_proj.weight', 'vision_model.encoder.layers.20.self_attn.k_proj.weight', 'vision_model.encoder.layers.13.self_attn.out_proj.weight', 'vision_model.encoder.layers.3.layer_norm2.weight', 'vision_model.encoder.layers.4.self_attn.k_proj.bias', 'vision_model.encoder.layers.0.layer_norm2.weight', 'vision_model.encoder.layers.14.layer_norm2.weight']
- This IS expected if you are initializing CLIPTextModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing CLIPTextModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Traceback (most recent call last):
File "/Users/home/github/CLONES/stable-diffusion/scripts/txt2img.py", line 278, in <module>
main()
File "/Users/home/github/CLONES/stable-diffusion/scripts/txt2img.py", line 187, in main
model = load_model_from_config(config, f"{opt.ckpt}")
File "/Users/home/github/CLONES/stable-diffusion/scripts/txt2img.py", line 39, in load_model_from_config
model.cuda()
File "/opt/homebrew/Caskroom/miniforge/base/lib/python3.10/site-packages/pytorch_lightning/core/mixins/device_dtype_mixin.py", line 128, in cuda
device = torch.device("cuda", torch.cuda.current_device())
File "/opt/homebrew/Caskroom/miniforge/base/lib/python3.10/site-packages/torch/cuda/__init__.py", line 487, in current_device
_lazy_init()
File "/opt/homebrew/Caskroom/miniforge/base/lib/python3.10/site-packages/torch/cuda/__init__.py", line 216, in _lazy_init
raise AssertionError("Torch not compiled with CUDA enabled")
AssertionError: Torch not compiled with CUDA enabled
it seems to suggest i need cudatools, but you can't get cudatools on mac ??
@nogira are you using Pull Request #47 ? e.g. gh pr checkout 47
(gives a PYTORCH_ENABLE_MPS_FALLBACK error, not fixed by export PYTORCH_ENABLE_MPS_FALLBACK=1
)
Or https://github.com/Raymonf/stable-diffusion/tree/apple-silicon-mps-support ? e.g. gh repo clone Raymonf/stable-diffusion
(give a 'Torch not compiled with CUDA enabled' error)
@HenkPoley oops, forgot to change to the m1 branch after cloning
@nogira, does #47 work for you?
@HenkPoley i installed from scratch with conda env create -f environment-mac.yaml
and now i'm getting the error No matching distribution found for opencv-python==4.1.2.30
to find a compatible version i did pip install opencv-python==4.1.2.30
, then incremented the version until it didn't give the error (4.3.0.38
), then i changed the version in environment-mac.yaml
and installed from scratch again, but then that version gave the same error 🙃
You can edit environment-mac.yaml to mention a version that does exist. (no clue why I can't link to the change with my comments)
Or have it say opencv-python>=some.version.number
finally got it to work by ditching the environment.yaml
file again and just using conda/pip install directly
it seems to be generating images (very slow), but i still get the Some weights of the model checkpoint at openai/clip-vit-large-patch14 were not used when initializing CLIPTextModel
warning with a giant dump of all the weights not used. is this normal? (is this related to "StableDiffusion is CPU-only on M1 Macs because not all the pytorch ops are implemented for Metal." ?)
environment-mac.yaml
is the one for Mac M1. Though it has some fairly old dependencies hardcoded. I'm sure it won't work with that.
environment-mac.yaml is the one for Mac M1. Though it has some fairly old dependencies hardcoded. I'm sure it won't work with that.
yeah thats the one i used but it didn't work
When I run this on Monterey 12.5.1, Python 3.8 consumes 18.70 GB at peak
im getting 15GB median memory usage
only 10GB if you use the arguments --n_samples 1 --n_rows 1
Here are more details about the steps I took last night in case anyone else gets stuck.
Filesystem paths are probably different for you and I'm missing a few steps, but roughly:
Update conda conda update -y -n base -c defaults conda
Clone @magnusviri fork git clone https://github.com/magnusviri/stable-diffusion.git
Switch to MPS branch git checkout apple-silicon-mps-support
Modify environment-mac.yaml
to fix missing / conflicted items before running conda env create -f environment-mac.yaml
diff --git a/environment-mac.yaml b/environment-mac.yaml
index d923d56..a1a32d5 100644
--- a/environment-mac.yaml
+++ b/environment-mac.yaml
@@ -3,14 +3,14 @@ channels:
- pytorch
- defaults
dependencies:
Use pytorch nightly:
conda install -y pytorch torchvision torchaudio -c pytorch-nightly
Locate functional.py
on disk:
import torch
torch.__file__
'/Users/miniconda3/envs/ldm/lib/python3.10/site-packages/torch/nn/functional.py'
Modify layer_norm()
return value in ~/miniconda3/envs/ldm/lib/python3.10/site-packages/torch/nn/functional.py
diff --git a/functional.py b/Users/lab/miniconda3/envs/ldm/lib/python3.10/site-packages/torch/nn/functional.py
index 640428d..bc36cc0 100644
--- a/functional.py
+++ b/Users/lab/miniconda3/envs/ldm/lib/python3.10/site-packages/torch/nn/functional.py
@@ -2508,7 +2508,7 @@ def layer_norm(
return handle_torch_function(
layer_norm, (input, weight, bias), input, normalized_shape, weight=weight, bias=bias, eps=eps
)
return torch.layer_norm(input, normalized_shape, weight, bias, eps, torch.backends.cudnn.enabled)
return torch.layer_norm(input.contiguous(), normalized_shape, weight, bias, eps, torch.backends.cudnn.enabled)
Clone weights
brew install git
brew install git-lfs
git clone https://username:token@huggingface.co/CompVis/stable-diffusion-v-1-4-original
cd ~/stable-diffusion-v-1-4-original
git lfs pull
Link to checkpoint file
cd ~/stable-diffusion
!ln -s ../stable-diffusion-v-1-4-original/sd-v1-4.ckpt models/ldm/stable-diffusion-v1/model.ckpt
Run txt2img
using less memory (I only have 16GB) by setting n_samples=1
:
python scripts/txt2img.py --prompt "a photograph of an astronaut riding a horse" --plms --n_samples 1
running this gives me the exact same image twice (stacked vertically) in the same image file
python scripts/txt2img.py --prompt "a photograph of an astronaut riding a horse" --plms --n_samples 1
like this: 🤖 🤖
is it running the same computation on the same seed or something? how do i turn it off so my image file only contains 1 image?
@nogira --n_iter 1
Yeah I was wondering the same thing, finally figured it out this morning
I think --n_samples 1
actually does produce 2 different images (by default), but the grid repeats the first image. The original samples (both of them) should be in the outputs/txt2img-samples/samples/
folder.
rows in the grid (default: n_samples)
I suspect it might be a bug in txt2img.py
, it seems like n_rows
should default to n_iter
and not to n_samples
Hi,
I’ve heard it is possible to run Stable-Diffusion on Mac Silicon (albeit slowly), would be good to include basic setup and instructions to do this.
Thanks, Chris