CompVis / stable-diffusion

A latent text-to-image diffusion model
https://ommer-lab.com/research/latent-diffusion-models/
Other
67.72k stars 10.11k forks source link

Instructions for setup and running on Mac Silicon chips #25

Open crsrusl opened 2 years ago

crsrusl commented 2 years ago

Hi,

I’ve heard it is possible to run Stable-Diffusion on Mac Silicon (albeit slowly), would be good to include basic setup and instructions to do this.

Thanks, Chris

thelamedia commented 2 years ago

I heard that pytorch updated to include apple MDS in the latest nightly release as well. Will this improve performance on M1 devices by utilizing Metal?

mja commented 2 years ago

With Homebrew

brew install python3@3.10
pip3 install torch torchvision
pip3 install setuptools_rust
pip3 install -U git+https://github.com/huggingface/diffusers.git
pip3 install transformers scipy ftfy

Then start python3 and follow the instructions for using diffusers.

StableDiffusion is CPU-only on M1 Macs because not all the pytorch ops are implemented for Metal. Generating one image with 50 steps takes 4-5 minutes.

frenchie1980 commented 2 years ago

Hi @mja,

thanks for these steps. I can get as far as the last one but then installing transformers fails with this error. (the install of setuptools_rust was successful )

      running build_ext
      running build_rust
      error: can't find Rust compiler

      If you are using an outdated pip version, it is possible a prebuilt wheel is available for this package but pip is not able to install from it. Installing from the wheel would avoid the need for a Rust compiler.

      To update pip, run:

          pip install --upgrade pip

      and then retry package installation.

      If you did intend to build this package from source, try installing a Rust compiler from your system package manager and ensure it is on the PATH during installation. Alternatively, rustup (available at https://rustup.rs) is the recommended way to download and update the Rust compiler toolchain.
      [end of output]

  note: This error originates from a subprocess, and is likely not a problem with pip.
  ERROR: Failed building wheel for tokenizers
Failed to build tokenizers
ERROR: Could not build wheels for tokenizers, which is required to install pyproject.toml-based projects

for context the first step failed to install python3.10 with brew so I did it with Conda instead. Not sure if having a full anaconda env installed is the problem

filipux commented 2 years ago

Just tried the pytorch nighly build with mps support and have some good news.

On my cpu (M1 Max) it runs very slow, almost 9 minutes per image, but with mps enabled it's ~18x faster: less than 30 seconds per image🤩

thelamedia commented 2 years ago

Incredible! Would you mind sharing your exact setup so I can duplicate on my end?

filipux commented 2 years ago

Unfortunately I got it working by many hours of trial and error, and in the end I don't know what worked. I'm not even a programmer, I'm just really good at googling stuff.

Basically my process was:

I'm sorry that I can't be more helpful than this.

thelamedia commented 2 years ago

Thanks. What are you currently using for checkpoints? Are you using research weights or are you using another model for now?

magnusviri commented 2 years ago

I don't have access to the model so I haven't tested it, but based off of what @filipux said, I created this pull request to add mps support. If you can't wait for them to merge it you can clone my fork and switch to the apple-silicon-mps-support branch and try it out. Just follow the normal instructions but instead of running conda env create -f environment.yaml, run conda env create -f environment-mac.yaml. I think the only other requirement is that you have to have macOS 12.3 or greater.

einanao commented 2 years ago

I couldn't quite get your fork to work @magnusviri, but based on most of @filipux's suggestions, I was able to install and generate samples on my M2 machine using https://github.com/einanao/stable-diffusion/tree/apple-silicon

Raymonf commented 2 years ago

Edit: If you're looking at this comment now, you probably shouldn't follow this. Apparently a lot can change in 2 weeks!

Old comment

I got it to work fully natively without the CPU fallback, sort of. The way I did things is ugly since I prioritized making it work. I can't comment on speeds but my assumption is that using only the native MPS backend is faster? I used the mps_master branch from kulinseth/pytorch as a base, since it contains an implementation for `aten::index.Tensor_out` that appears to work from what I can tell: https://github.com/Raymonf/pytorch/tree/mps_master If you want to use my ugly changes, you'll have to compile PyTorch from scratch as I couldn't get the CPU fallback to work: ```bash # clone the modified mps_master branch git clone --recursive -b mps_master https://github.com/Raymonf/pytorch.git pytorch_mps && cd pytorch_mps # dependencies to build (including for distributed) # slightly modified from the docs conda install astunparse numpy ninja pyyaml setuptools cmake cffi typing_extensions future six requests dataclasses pkg-config libuv # build pytorch with explicit USE_DISTRIBUTED=1 USE_DISTRIBUTED=1 MACOSX_DEPLOYMENT_TARGET=12.4 CC=clang CXX=clang++ python setup.py install ``` I based my version of the Stable Diffusion code on the code from PR #47's branch, you can find my fork here: https://github.com/Raymonf/stable-diffusion/tree/apple-silicon-mps-support Just your typical `pip install -e .` should work for this, there's nothing too special going on here, it's just not what I'd call upstream-quality code by any means. I have only tested txt2img, but I did try to modify knn2img and img2img too. Edit: It definitely takes more than 20 seconds per image at the default settings with either sampler, not sure if I did something wrong. Might be hitting https://github.com/pytorch/pytorch/issues/77799 :(


@magnusviri: You are free to take anything from my branch for yourself if it's helpful at all, thanks for the PR 😃

magnusviri commented 2 years ago

@Raymonf: I merged your changes with mine and so they are in the pull request now. It caught everything that I missed and it almost identical to the changes that @einanao made as well. The only difference I could see was in ldm/models/diffusion/plms.py

einanao:

    def register_buffer(self, name, attr):
        if type(attr) == torch.Tensor:
            if attr.device != torch.device("cuda"):
                attr = attr.type(torch.float32).to(torch.device("mps")).contiguous()

Raymonf:

    def register_buffer(self, name, attr):
        if type(attr) == torch.Tensor:
            if attr.device != torch.device(self.device_available):
                attr = attr.to(torch.float32).to(torch.device(self.device_available))

I don't know what the code differences are, except that I read that adding .contiguous() fixes bugs when falling back to the cpu.

einanao commented 2 years ago

Pretty sure my version is redundant (I also added a downstream call to .contiguous(), but forgot to remove this one)

On Sun, Aug 21, 2022 at 1:49 AM, James Reynolds @.***> wrote:

@.(https://github.com/Raymonf): I merged your changes with mine and so they are in the pull request now. It caught everything that I missed and it almost identical to the changes that @.(https://github.com/einanao) made as well. The only difference I could see was in ldm/models/diffusion/plms.py

einanao:

def register_buffer(self, name, attr): if type(attr) == torch.Tensor: if attr.device != torch.device("cuda"): attr = attr.type(torch.float32).to(torch.device("mps")).contiguous()

Raymonf:

def register_buffer(self, name, attr): if type(attr) == torch.Tensor: if attr.device != torch.device(self.device_available): attr = attr.to(torch.float32).to(torch.device(self.device_available))

I don't know what the code differences are, except that I read that adding .contiguous() fixes bugs when falling back to the cpu.

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.Message ID: @.***>

Raymonf commented 2 years ago

@einanao Maybe not! How long does yours take to run the default seed and prompt with full precision? GNU time reports 4.52.5 minutes with the fans at 100% on a 16 inch M1 Max, which is way longer than 20 seconds. I'm curious if you using the CPU fallback for some parts makes it faster at all.

einanao commented 2 years ago

It takes me 1.5 minutes to generate 1 sample on a 13 inch M2 2022

thelamedia commented 2 years ago

I'm getting this error when trying to run with the laion400 data set:

return torch.layer_norm(input, normalized_shape, weight, bias, eps, torch.backends.cudnn.enabled) RuntimeError: view size is not compatible with input tensor's size and stride (at least one dimension spans across two contiguous subspaces). Use .reshape(...) instead.

Is this an issue with the torch functional.py script?

einanao commented 2 years ago

Yes, see @filipux's earlier comment:

in layer_norm() in functional.py (part of pytorch I guess), change to return torch.layer_norm(input.contiguous(), ...

thelamedia commented 2 years ago

@einanao thank you. One step closer, but now I'm getting this: return torch.layer_norm(input, normalized_shape, weight, bias, eps, torch.backends.mps.enabled) AttributeError: module 'torch.backends.mps' has no attribute 'enabled'

Here is my function:

def layer_norm( input: Tensor, normalized_shape: List[int], weight: Optional[Tensor] = None, bias: Optional[Tensor] = None, eps: float = 1e-5, ) -> Tensor:

if has_torch_function_variadic(input, weight, bias):
    return handle_torch_function(
        layer_norm, (input.contiguous(), weight, bias), input, normalized_shape, weight=weight, bias=bias, eps=eps
    )
return torch.layer_norm(input.contiguous(), normalized_shape, weight, bias, eps, torch.backends.cudnn.enabled)`
byhringo commented 2 years ago

It takes me 1.5 minutes to generate 1 sample on a 13 inch M2 2022

For benchmarking purposes - I'm at ~150s (2.5 minutes) on each iteration past the first, which was over 500s after setting up with the steps in these comments.

14" 2021 Macbook Pro with base specs. (M1 Pro chip)

Automatt commented 2 years ago

This worked for me. I'm seeing about 30 seconds per image on a 14" M1 Max MacBook Pro (32 GPU core).

henrique-galimberti commented 2 years ago

This worked for me. I'm seeing about 30 seconds per image on a 14" M1 Max MacBook Pro (32 GPU core).

What steps did you follow? I tried three apple forks but they all are taking 1h to generate using the sample command (python scripts/txt2img.py --prompt "a photograph of an astronaut riding a horse" --plms) I'm using pytorch nightly btw.

Automatt commented 2 years ago

@henrique-galimberti I followed these steps:

recurrence commented 2 years ago

mps support for aten::index.Tensor_out is now in pytorch nightly according to Denis

recurrence commented 2 years ago

Looks like there's a ticket for the reshape error at https://github.com/pytorch/pytorch/issues/80800

magnusviri commented 2 years ago

mps support for aten::index.Tensor_out is now in pytorch nightly according to Denis

Is that the pytorch nightly branch? That particular branch is 1068 commits ahead and 28606 commits behind the master. The last commit was 15 hours ago. But master has commits kind of non-stop for the last 9 hours.

pnodseth commented 2 years ago

@henrique-galimberti I followed these steps:

  • Install PyTorch nightly
  • Used this branch referenced above from @magnusviri
  • Modify functional.py as noted above here to resolve view size not compatible issue

Where can I find the functional.py file ?

cgodley commented 2 years ago

Where can I find the functional.py file ?

import torch
torch.__file__

For me the path is below. Your path will be different.

'/Users/lab/.local/share/virtualenvs/lab-2cY4ojCF/lib/python3.10/site-packages/torch/__init__.py'

Then replace __init__.py with nn/functional.py

henrique-galimberti commented 2 years ago

I change conda env to use rosetta and it is faster than before, but still waaaay too slow: image

recurrence commented 2 years ago

Is that the pytorch nightly branch? That particular branch is 1068 commits ahead and 28606 commits behind the master.

It was merged 5 days ago so it should be in the regular PyTorch nightly that you can get directly from the PyTorch site.

cgodley commented 2 years ago

@henrique-galimberti I followed these steps:

* Install PyTorch nightly

* Used [this branch](https://github.com/CompVis/stable-diffusion/pull/47) referenced above from @magnusviri

* Modify functional.py as noted above [here](https://github.com/CompVis/stable-diffusion/issues/25#issuecomment-1221667017) to resolve view size not compatible issue

I also followed these steps and confirmed MPS was being used (printed the return value of get_device()) but it's taking about 31.74s/it, which seems very slow.

cgodley commented 2 years ago

After waiting a few minutes the average speed increased to 12.74s/it

I closed all other apps, mem pressure graph looks like this:

image
Automatt commented 2 years ago

When I run this on Monterey 12.5.1, Python 3.8 consumes 18.70 GB at peak

Raymonf commented 2 years ago

@cgodley it looks like 16GB of RAM isn’t enough to run this well. It’s slow for you because of all of the swapping, which is quite unhealthy for your SSD. If it takes 12s/iteration, I’d try CPU as I was getting ~11.5s/it with CPU inference.

cgodley commented 2 years ago

@Raymonf Yes you're correct, it's using swap so it's not going to run smoothly for me.

Just realized I was using the default settings for txt2img.py, which I think does batches of 3 images at a time. I'll try doing single images and see if that helps. Otherwise I'll probably have to use Colab or AWS

cgodley commented 2 years ago

That worked.

Just tried python scripts/txt2img.py --n_samples 1 --n_rows 1 --plms and I'm getting 1.24it/s now. My swap usage is about 1GB but the mem pressure graph is all green and I don't see much disk activity. Presumably some other apps got swapped to disk at the beginning, then txt2img ran smoothly after that.

Takes abut 70 seconds to make 1 image.

angeris commented 2 years ago

Nice, I am using the 2020 13” pro w/ 16gb RAM to generate single images (with a single sample) and I’m seeing around 2.5s/iter. Not great, but relatively reasonable!

filipux commented 2 years ago

Have anyone managed to run img2img? I just did quick test with a 512x512 input image but there is an error message:

AppleInternal/Library/BuildRoots/20d6c351-ee94-11ec-bcaf-7247572f23b4/Library/Caches/com.apple.xbs/Sources/MetalPerformanceShaders/MPSCore/Types/MPSNDArray.mm:705: failed assertion '[MPSTemporaryNDArray initWithDevice:descriptor:] Error: product of dimension sizes > 2**31'

This error also shows when I try to change the width or height (--W/--H) in regular txt2img.

ghost commented 2 years ago

an even simpler way to change the torch functional.py file is to just run this which does it automatically:

python -c "
import torch
import os
filename = torch.__file__[:-11] + 'nn/functional.py'
os.system(f\"sed -i '' 's/return torch.layer_norm(input/return torch.layer_norm(input.contiguous()/g' '{filename}'\")
"

that said, i can't get the model to run. i'm getting an error from line 39 of txt2img.py (model.cuda()):

Some weights of the model checkpoint at openai/clip-vit-large-patch14 were not used when initializing CLIPTextModel: ['vision_model.encoder.layers.22.mlp.fc1.weight', 'vision_model.encoder.layers.13.self_attn.k_proj.weight', 'vision_model.encoder.layers.3.self_attn.v_proj.weight', 'vision_model.encoder.layers.14.mlp.fc1.bias', 'vision_model.encoder.layers.5.self_attn.v_proj.weight', 'vision_model.encoder.layers.14.layer_norm1.weight', 'vision_model.encoder.layers.21.self_attn.k_proj.bias', 'vision_model.encoder.layers.17.layer_norm2.bias', 'vision_model.encoder.layers.4.layer_norm1.weight', 'vision_model.encoder.layers.13.self_attn.v_proj.bias', 'vision_model.encoder.layers.2.self_attn.k_proj.bias', 'vision_model.encoder.layers.14.self_attn.out_proj.weight', 'vision_model.encoder.layers.20.self_attn.v_proj.weight', 'vision_model.embeddings.position_embedding.weight', 'vision_model.encoder.layers.8.mlp.fc1.bias', 'vision_model.encoder.layers.18.self_attn.k_proj.bias', 'vision_model.pre_layrnorm.bias', 'vision_model.encoder.layers.10.self_attn.k_proj.weight', 'vision_model.encoder.layers.1.layer_norm2.bias', 'vision_model.encoder.layers.7.mlp.fc1.bias', 'vision_model.encoder.layers.23.mlp.fc2.weight', 'vision_model.encoder.layers.14.self_attn.q_proj.bias', 'vision_model.encoder.layers.0.layer_norm1.bias', 'vision_model.encoder.layers.16.self_attn.v_proj.bias', 'vision_model.encoder.layers.8.self_attn.v_proj.weight', 'vision_model.encoder.layers.18.mlp.fc2.weight', 'vision_model.encoder.layers.17.self_attn.v_proj.bias', 'vision_model.encoder.layers.2.self_attn.k_proj.weight', 'vision_model.encoder.layers.23.self_attn.v_proj.bias', 'vision_model.encoder.layers.19.mlp.fc2.bias', 'vision_model.encoder.layers.18.self_attn.v_proj.weight', 'vision_model.encoder.layers.4.mlp.fc2.weight', 'vision_model.encoder.layers.21.layer_norm2.bias', 'vision_model.encoder.layers.22.layer_norm2.bias', 'vision_model.encoder.layers.3.mlp.fc2.weight', 'vision_model.encoder.layers.19.self_attn.q_proj.bias', 'vision_model.encoder.layers.17.mlp.fc1.weight', 'vision_model.encoder.layers.7.self_attn.out_proj.bias', 'vision_model.encoder.layers.10.layer_norm2.bias', 'vision_model.encoder.layers.19.mlp.fc2.weight', 'vision_model.encoder.layers.6.layer_norm1.weight', 'vision_model.encoder.layers.3.mlp.fc1.weight', 'vision_model.encoder.layers.7.self_attn.v_proj.weight', 'vision_model.encoder.layers.18.mlp.fc1.bias', 'vision_model.encoder.layers.4.layer_norm1.bias', 'vision_model.encoder.layers.8.layer_norm2.bias', 'vision_model.encoder.layers.21.self_attn.k_proj.weight', 'text_projection.weight', 'vision_model.encoder.layers.16.mlp.fc2.bias', 'vision_model.encoder.layers.1.self_attn.out_proj.weight', 'vision_model.encoder.layers.1.layer_norm1.bias', 'vision_model.encoder.layers.13.mlp.fc1.weight', 'vision_model.encoder.layers.2.mlp.fc1.weight', 'vision_model.encoder.layers.6.mlp.fc1.bias', 'vision_model.encoder.layers.15.self_attn.out_proj.bias', 'vision_model.encoder.layers.1.layer_norm1.weight', 'vision_model.encoder.layers.10.self_attn.v_proj.weight', 'vision_model.encoder.layers.0.self_attn.q_proj.weight', 'vision_model.encoder.layers.21.self_attn.out_proj.weight', 'vision_model.encoder.layers.14.layer_norm2.bias', 'vision_model.encoder.layers.15.layer_norm2.bias', 'vision_model.encoder.layers.19.self_attn.v_proj.bias', 'vision_model.encoder.layers.17.self_attn.q_proj.bias', 'vision_model.encoder.layers.10.layer_norm2.weight', 'vision_model.encoder.layers.21.layer_norm1.bias', 'vision_model.encoder.layers.1.mlp.fc2.weight', 'vision_model.encoder.layers.12.mlp.fc1.bias', 'vision_model.encoder.layers.14.self_attn.q_proj.weight', 'vision_model.encoder.layers.23.mlp.fc2.bias', 'vision_model.encoder.layers.5.layer_norm2.weight', 'vision_model.encoder.layers.20.mlp.fc2.weight', 'vision_model.encoder.layers.1.self_attn.q_proj.bias', 'vision_model.encoder.layers.5.self_attn.q_proj.weight', 'vision_model.encoder.layers.5.self_attn.k_proj.bias', 'vision_model.encoder.layers.8.layer_norm1.bias', 'vision_model.encoder.layers.4.self_attn.out_proj.weight', 'vision_model.encoder.layers.7.mlp.fc2.bias', 'vision_model.encoder.layers.5.self_attn.out_proj.bias', 'vision_model.encoder.layers.23.layer_norm1.bias', 'vision_model.encoder.layers.10.self_attn.q_proj.weight', 'vision_model.encoder.layers.14.self_attn.k_proj.bias', 'vision_model.encoder.layers.7.layer_norm1.bias', 'vision_model.encoder.layers.19.self_attn.k_proj.weight', 'vision_model.encoder.layers.16.layer_norm2.weight', 'vision_model.encoder.layers.17.self_attn.v_proj.weight', 'vision_model.encoder.layers.13.layer_norm1.weight', 'vision_model.encoder.layers.22.self_attn.q_proj.bias', 'vision_model.encoder.layers.13.mlp.fc2.weight', 'vision_model.encoder.layers.12.mlp.fc2.weight', 'vision_model.encoder.layers.16.layer_norm1.weight', 'vision_model.encoder.layers.10.layer_norm1.bias', 'vision_model.encoder.layers.9.self_attn.k_proj.weight', 'vision_model.encoder.layers.2.self_attn.out_proj.weight', 'vision_model.encoder.layers.9.self_attn.v_proj.weight', 'vision_model.encoder.layers.13.self_attn.v_proj.weight', 'vision_model.encoder.layers.21.mlp.fc2.bias', 'vision_model.encoder.layers.2.mlp.fc1.bias', 'vision_model.encoder.layers.20.mlp.fc2.bias', 'vision_model.encoder.layers.19.self_attn.k_proj.bias', 'vision_model.encoder.layers.4.mlp.fc2.bias', 'vision_model.encoder.layers.6.layer_norm2.bias', 'vision_model.encoder.layers.21.mlp.fc1.bias', 'vision_model.encoder.layers.11.self_attn.q_proj.bias', 'vision_model.encoder.layers.20.self_attn.out_proj.bias', 'vision_model.encoder.layers.17.mlp.fc2.bias', 'vision_model.encoder.layers.16.self_attn.k_proj.weight', 'vision_model.encoder.layers.7.layer_norm2.weight', 'vision_model.encoder.layers.8.mlp.fc2.bias', 'vision_model.encoder.layers.5.mlp.fc1.bias', 'vision_model.encoder.layers.15.self_attn.q_proj.bias', 'vision_model.encoder.layers.6.self_attn.q_proj.weight', 'vision_model.encoder.layers.22.self_attn.q_proj.weight', 'vision_model.encoder.layers.12.layer_norm1.weight', 'vision_model.encoder.layers.19.layer_norm1.bias', 'vision_model.encoder.layers.18.layer_norm2.bias', 'vision_model.encoder.layers.14.self_attn.v_proj.bias', 'vision_model.encoder.layers.23.self_attn.out_proj.weight', 'vision_model.encoder.layers.11.self_attn.v_proj.bias', 'vision_model.encoder.layers.10.self_attn.out_proj.bias', 'vision_model.encoder.layers.1.mlp.fc2.bias', 'vision_model.encoder.layers.3.mlp.fc2.bias', 'vision_model.encoder.layers.6.self_attn.q_proj.bias', 'logit_scale', 'vision_model.encoder.layers.5.layer_norm1.weight', 'vision_model.encoder.layers.4.mlp.fc1.weight', 'vision_model.encoder.layers.6.self_attn.k_proj.weight', 'vision_model.encoder.layers.19.mlp.fc1.weight', 'vision_model.encoder.layers.21.layer_norm1.weight', 'vision_model.encoder.layers.2.self_attn.v_proj.weight', 'visual_projection.weight', 'vision_model.encoder.layers.18.self_attn.q_proj.bias', 'vision_model.encoder.layers.15.self_attn.q_proj.weight', 'vision_model.encoder.layers.21.self_attn.q_proj.bias', 'vision_model.encoder.layers.2.self_attn.q_proj.bias', 'vision_model.encoder.layers.4.self_attn.v_proj.bias', 'vision_model.encoder.layers.19.mlp.fc1.bias', 'vision_model.encoder.layers.3.self_attn.v_proj.bias', 'vision_model.encoder.layers.2.self_attn.out_proj.bias', 'vision_model.encoder.layers.15.self_attn.v_proj.bias', 'vision_model.encoder.layers.4.self_attn.v_proj.weight', 'vision_model.encoder.layers.22.self_attn.k_proj.weight', 'vision_model.encoder.layers.12.self_attn.q_proj.bias', 'vision_model.encoder.layers.15.layer_norm2.weight', 'vision_model.encoder.layers.23.layer_norm2.weight', 'vision_model.encoder.layers.16.mlp.fc2.weight', 'vision_model.encoder.layers.13.layer_norm2.bias', 'vision_model.encoder.layers.9.self_attn.q_proj.weight', 'vision_model.encoder.layers.9.mlp.fc2.bias', 'vision_model.encoder.layers.21.self_attn.q_proj.weight', 'vision_model.encoder.layers.5.layer_norm1.bias', 'vision_model.encoder.layers.20.layer_norm2.bias', 'vision_model.encoder.layers.11.layer_norm2.bias', 'vision_model.encoder.layers.6.mlp.fc2.bias', 'vision_model.encoder.layers.23.mlp.fc1.bias', 'vision_model.encoder.layers.13.layer_norm2.weight', 'vision_model.encoder.layers.10.self_attn.q_proj.bias', 'vision_model.encoder.layers.18.mlp.fc2.bias', 'vision_model.encoder.layers.20.layer_norm1.weight', 'vision_model.encoder.layers.0.self_attn.out_proj.weight', 'vision_model.encoder.layers.3.self_attn.k_proj.weight', 'vision_model.encoder.layers.5.layer_norm2.bias', 'vision_model.encoder.layers.12.layer_norm2.bias', 'vision_model.encoder.layers.12.self_attn.v_proj.bias', 'vision_model.encoder.layers.15.self_attn.k_proj.weight', 'vision_model.encoder.layers.7.self_attn.k_proj.bias', 'vision_model.encoder.layers.10.self_attn.k_proj.bias', 'vision_model.encoder.layers.10.mlp.fc1.weight', 'vision_model.encoder.layers.12.mlp.fc2.bias', 'vision_model.encoder.layers.12.self_attn.k_proj.bias', 'vision_model.encoder.layers.10.mlp.fc1.bias', 'vision_model.encoder.layers.18.layer_norm1.bias', 'vision_model.encoder.layers.1.mlp.fc1.bias', 'vision_model.encoder.layers.17.self_attn.q_proj.weight', 'vision_model.encoder.layers.1.self_attn.out_proj.bias', 'vision_model.encoder.layers.6.mlp.fc2.weight', 'vision_model.encoder.layers.11.self_attn.out_proj.bias', 'vision_model.encoder.layers.0.mlp.fc2.weight', 'vision_model.encoder.layers.3.self_attn.out_proj.bias', 'vision_model.encoder.layers.4.self_attn.out_proj.bias', 'vision_model.encoder.layers.11.layer_norm1.weight', 'vision_model.encoder.layers.6.layer_norm1.bias', 'vision_model.encoder.layers.16.self_attn.out_proj.weight', 'vision_model.encoder.layers.23.self_attn.q_proj.weight', 'vision_model.encoder.layers.18.self_attn.k_proj.weight', 'vision_model.encoder.layers.17.layer_norm2.weight', 'vision_model.encoder.layers.2.layer_norm1.weight', 'vision_model.encoder.layers.15.self_attn.k_proj.bias', 'vision_model.encoder.layers.9.layer_norm1.bias', 'vision_model.encoder.layers.7.mlp.fc1.weight', 'vision_model.encoder.layers.15.mlp.fc1.weight', 'vision_model.encoder.layers.23.mlp.fc1.weight', 'vision_model.encoder.layers.18.layer_norm2.weight', 'vision_model.encoder.layers.1.self_attn.v_proj.bias', 'vision_model.encoder.layers.19.self_attn.v_proj.weight', 'vision_model.encoder.layers.2.layer_norm2.weight', 'vision_model.encoder.layers.15.mlp.fc2.weight', 'vision_model.encoder.layers.22.self_attn.v_proj.weight', 'vision_model.encoder.layers.21.self_attn.out_proj.bias', 'vision_model.encoder.layers.8.layer_norm1.weight', 'vision_model.encoder.layers.3.self_attn.out_proj.weight', 'vision_model.encoder.layers.1.self_attn.k_proj.bias', 'vision_model.encoder.layers.6.self_attn.k_proj.bias', 'vision_model.encoder.layers.9.layer_norm2.bias', 'vision_model.encoder.layers.16.self_attn.k_proj.bias', 'vision_model.encoder.layers.19.layer_norm2.weight', 'vision_model.encoder.layers.22.layer_norm1.weight', 'vision_model.encoder.layers.8.self_attn.k_proj.bias', 'vision_model.encoder.layers.4.layer_norm2.weight', 'vision_model.encoder.layers.13.self_attn.q_proj.weight', 'vision_model.encoder.layers.23.self_attn.k_proj.weight', 'vision_model.encoder.layers.18.self_attn.q_proj.weight', 'vision_model.encoder.layers.11.mlp.fc2.bias', 'vision_model.encoder.layers.0.mlp.fc2.bias', 'vision_model.encoder.layers.3.self_attn.q_proj.bias', 'vision_model.encoder.layers.9.self_attn.k_proj.bias', 'vision_model.encoder.layers.15.layer_norm1.bias', 'vision_model.encoder.layers.6.mlp.fc1.weight', 'vision_model.encoder.layers.0.layer_norm2.bias', 'vision_model.encoder.layers.12.self_attn.v_proj.weight', 'vision_model.encoder.layers.0.self_attn.v_proj.weight', 'vision_model.encoder.layers.0.self_attn.q_proj.bias', 'vision_model.encoder.layers.0.layer_norm1.weight', 'vision_model.encoder.layers.9.mlp.fc1.weight', 'vision_model.encoder.layers.8.self_attn.q_proj.weight', 'vision_model.encoder.layers.14.layer_norm1.bias', 'vision_model.encoder.layers.21.mlp.fc1.weight', 'vision_model.encoder.layers.9.layer_norm1.weight', 'vision_model.encoder.layers.2.self_attn.q_proj.weight', 'vision_model.encoder.layers.16.mlp.fc1.weight', 'vision_model.encoder.layers.2.self_attn.v_proj.bias', 'vision_model.encoder.layers.17.self_attn.out_proj.bias', 'vision_model.encoder.layers.5.self_attn.v_proj.bias', 'vision_model.encoder.layers.8.layer_norm2.weight', 'vision_model.encoder.layers.11.self_attn.q_proj.weight', 'vision_model.encoder.layers.17.self_attn.k_proj.bias', 'vision_model.encoder.layers.2.layer_norm1.bias', 'vision_model.encoder.layers.3.layer_norm1.bias', 'vision_model.encoder.layers.5.mlp.fc2.weight', 'vision_model.encoder.layers.14.mlp.fc2.weight', 'vision_model.encoder.layers.11.layer_norm2.weight', 'vision_model.encoder.layers.9.self_attn.out_proj.bias', 'vision_model.embeddings.patch_embedding.weight', 'vision_model.encoder.layers.16.self_attn.out_proj.bias', 'vision_model.encoder.layers.15.self_attn.out_proj.weight', 'vision_model.encoder.layers.6.self_attn.out_proj.bias', 'vision_model.encoder.layers.5.self_attn.q_proj.bias', 'vision_model.encoder.layers.7.layer_norm1.weight', 'vision_model.encoder.layers.22.mlp.fc2.bias', 'vision_model.encoder.layers.1.self_attn.k_proj.weight', 'vision_model.encoder.layers.8.mlp.fc2.weight', 'vision_model.encoder.layers.13.layer_norm1.bias', 'vision_model.encoder.layers.11.mlp.fc1.weight', 'vision_model.encoder.layers.9.mlp.fc2.weight', 'vision_model.encoder.layers.9.self_attn.v_proj.bias', 'vision_model.encoder.layers.22.mlp.fc2.weight', 'vision_model.encoder.layers.16.mlp.fc1.bias', 'vision_model.encoder.layers.0.mlp.fc1.weight', 'vision_model.encoder.layers.15.mlp.fc1.bias', 'vision_model.encoder.layers.17.layer_norm1.weight', 'vision_model.encoder.layers.20.self_attn.v_proj.bias', 'vision_model.encoder.layers.8.self_attn.q_proj.bias', 'vision_model.encoder.layers.20.self_attn.out_proj.weight', 'vision_model.encoder.layers.17.layer_norm1.bias', 'vision_model.encoder.layers.3.mlp.fc1.bias', 'vision_model.encoder.layers.20.layer_norm1.bias', 'vision_model.encoder.layers.10.mlp.fc2.weight', 'vision_model.encoder.layers.8.self_attn.out_proj.bias', 'vision_model.encoder.layers.4.self_attn.k_proj.weight', 'vision_model.encoder.layers.9.self_attn.q_proj.bias', 'vision_model.encoder.layers.3.self_attn.k_proj.bias', 'vision_model.encoder.layers.16.layer_norm1.bias', 'vision_model.encoder.layers.21.self_attn.v_proj.bias', 'vision_model.encoder.layers.8.self_attn.out_proj.weight', 'vision_model.encoder.layers.3.self_attn.q_proj.weight', 'vision_model.encoder.layers.8.self_attn.v_proj.bias', 'vision_model.embeddings.class_embedding', 'vision_model.encoder.layers.17.mlp.fc1.bias', 'vision_model.encoder.layers.20.mlp.fc1.weight', 'vision_model.encoder.layers.18.mlp.fc1.weight', 'vision_model.encoder.layers.13.mlp.fc1.bias', 'vision_model.encoder.layers.15.layer_norm1.weight', 'vision_model.encoder.layers.14.mlp.fc1.weight', 'vision_model.encoder.layers.23.self_attn.v_proj.weight', 'vision_model.encoder.layers.17.self_attn.out_proj.weight', 'vision_model.encoder.layers.14.mlp.fc2.bias', 'vision_model.encoder.layers.0.self_attn.out_proj.bias', 'vision_model.encoder.layers.16.self_attn.q_proj.weight', 'vision_model.encoder.layers.23.layer_norm2.bias', 'vision_model.encoder.layers.19.self_attn.q_proj.weight', 'vision_model.encoder.layers.13.mlp.fc2.bias', 'vision_model.encoder.layers.0.self_attn.k_proj.weight', 'vision_model.encoder.layers.14.self_attn.out_proj.bias', 'vision_model.encoder.layers.12.layer_norm2.weight', 'vision_model.encoder.layers.22.layer_norm1.bias', 'vision_model.embeddings.position_ids', 'vision_model.encoder.layers.22.self_attn.out_proj.bias', 'vision_model.encoder.layers.6.self_attn.v_proj.weight', 'vision_model.encoder.layers.0.self_attn.v_proj.bias', 'vision_model.encoder.layers.19.layer_norm2.bias', 'vision_model.encoder.layers.18.layer_norm1.weight', 'vision_model.encoder.layers.11.mlp.fc2.weight', 'vision_model.encoder.layers.23.self_attn.q_proj.bias', 'vision_model.encoder.layers.7.self_attn.q_proj.weight', 'vision_model.encoder.layers.22.layer_norm2.weight', 'vision_model.encoder.layers.5.self_attn.out_proj.weight', 'vision_model.encoder.layers.19.layer_norm1.weight', 'vision_model.pre_layrnorm.weight', 'vision_model.encoder.layers.6.self_attn.v_proj.bias', 'vision_model.encoder.layers.11.self_attn.v_proj.weight', 'vision_model.encoder.layers.23.self_attn.out_proj.bias', 'vision_model.encoder.layers.22.mlp.fc1.bias', 'vision_model.encoder.layers.14.self_attn.k_proj.weight', 'vision_model.encoder.layers.3.layer_norm1.weight', 'vision_model.encoder.layers.20.self_attn.q_proj.bias', 'vision_model.encoder.layers.5.mlp.fc2.bias', 'vision_model.encoder.layers.8.mlp.fc1.weight', 'vision_model.encoder.layers.9.layer_norm2.weight', 'vision_model.encoder.layers.11.self_attn.k_proj.bias', 'vision_model.encoder.layers.19.self_attn.out_proj.bias', 'vision_model.encoder.layers.20.self_attn.k_proj.bias', 'vision_model.encoder.layers.12.self_attn.out_proj.bias', 'vision_model.encoder.layers.1.layer_norm2.weight', 'vision_model.encoder.layers.10.layer_norm1.weight', 'vision_model.encoder.layers.6.self_attn.out_proj.weight', 'vision_model.encoder.layers.20.self_attn.q_proj.weight', 'vision_model.encoder.layers.6.layer_norm2.weight', 'vision_model.encoder.layers.14.self_attn.v_proj.weight', 'vision_model.encoder.layers.13.self_attn.k_proj.bias', 'vision_model.encoder.layers.22.self_attn.v_proj.bias', 'vision_model.encoder.layers.9.self_attn.out_proj.weight', 'vision_model.encoder.layers.19.self_attn.out_proj.weight', 'vision_model.encoder.layers.20.mlp.fc1.bias', 'vision_model.encoder.layers.16.self_attn.v_proj.weight', 'vision_model.encoder.layers.21.layer_norm2.weight', 'vision_model.encoder.layers.10.self_attn.out_proj.weight', 'vision_model.encoder.layers.21.mlp.fc2.weight', 'vision_model.encoder.layers.7.self_attn.q_proj.bias', 'vision_model.encoder.layers.12.self_attn.q_proj.weight', 'vision_model.encoder.layers.11.self_attn.out_proj.weight', 'vision_model.encoder.layers.1.self_attn.q_proj.weight', 'vision_model.encoder.layers.11.mlp.fc1.bias', 'vision_model.encoder.layers.7.self_attn.out_proj.weight', 'vision_model.encoder.layers.10.mlp.fc2.bias', 'vision_model.encoder.layers.18.self_attn.out_proj.bias', 'vision_model.encoder.layers.1.self_attn.v_proj.weight', 'vision_model.post_layernorm.weight', 'vision_model.encoder.layers.18.self_attn.out_proj.weight', 'vision_model.post_layernorm.bias', 'vision_model.encoder.layers.7.self_attn.v_proj.bias', 'vision_model.encoder.layers.2.mlp.fc2.bias', 'vision_model.encoder.layers.17.mlp.fc2.weight', 'vision_model.encoder.layers.20.layer_norm2.weight', 'vision_model.encoder.layers.3.layer_norm2.bias', 'vision_model.encoder.layers.7.mlp.fc2.weight', 'vision_model.encoder.layers.18.self_attn.v_proj.bias', 'vision_model.encoder.layers.12.layer_norm1.bias', 'vision_model.encoder.layers.13.self_attn.out_proj.bias', 'vision_model.encoder.layers.21.self_attn.v_proj.weight', 'vision_model.encoder.layers.16.self_attn.q_proj.bias', 'vision_model.encoder.layers.4.self_attn.q_proj.bias', 'vision_model.encoder.layers.0.self_attn.k_proj.bias', 'vision_model.encoder.layers.22.self_attn.k_proj.bias', 'vision_model.encoder.layers.23.self_attn.k_proj.bias', 'vision_model.encoder.layers.8.self_attn.k_proj.weight', 'vision_model.encoder.layers.11.self_attn.k_proj.weight', 'vision_model.encoder.layers.2.layer_norm2.bias', 'vision_model.encoder.layers.5.self_attn.k_proj.weight', 'vision_model.encoder.layers.9.mlp.fc1.bias', 'vision_model.encoder.layers.7.layer_norm2.bias', 'vision_model.encoder.layers.17.self_attn.k_proj.weight', 'vision_model.encoder.layers.12.self_attn.out_proj.weight', 'vision_model.encoder.layers.0.mlp.fc1.bias', 'vision_model.encoder.layers.15.mlp.fc2.bias', 'vision_model.encoder.layers.16.layer_norm2.bias', 'vision_model.encoder.layers.4.self_attn.q_proj.weight', 'vision_model.encoder.layers.12.self_attn.k_proj.weight', 'vision_model.encoder.layers.10.self_attn.v_proj.bias', 'vision_model.encoder.layers.15.self_attn.v_proj.weight', 'vision_model.encoder.layers.2.mlp.fc2.weight', 'vision_model.encoder.layers.13.self_attn.q_proj.bias', 'vision_model.encoder.layers.12.mlp.fc1.weight', 'vision_model.encoder.layers.7.self_attn.k_proj.weight', 'vision_model.encoder.layers.11.layer_norm1.bias', 'vision_model.encoder.layers.4.mlp.fc1.bias', 'vision_model.encoder.layers.1.mlp.fc1.weight', 'vision_model.encoder.layers.5.mlp.fc1.weight', 'vision_model.encoder.layers.4.layer_norm2.bias', 'vision_model.encoder.layers.23.layer_norm1.weight', 'vision_model.encoder.layers.22.self_attn.out_proj.weight', 'vision_model.encoder.layers.20.self_attn.k_proj.weight', 'vision_model.encoder.layers.13.self_attn.out_proj.weight', 'vision_model.encoder.layers.3.layer_norm2.weight', 'vision_model.encoder.layers.4.self_attn.k_proj.bias', 'vision_model.encoder.layers.0.layer_norm2.weight', 'vision_model.encoder.layers.14.layer_norm2.weight']
- This IS expected if you are initializing CLIPTextModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing CLIPTextModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Traceback (most recent call last):
  File "/Users/home/github/CLONES/stable-diffusion/scripts/txt2img.py", line 278, in <module>
    main()
  File "/Users/home/github/CLONES/stable-diffusion/scripts/txt2img.py", line 187, in main
    model = load_model_from_config(config, f"{opt.ckpt}")
  File "/Users/home/github/CLONES/stable-diffusion/scripts/txt2img.py", line 39, in load_model_from_config
    model.cuda()
  File "/opt/homebrew/Caskroom/miniforge/base/lib/python3.10/site-packages/pytorch_lightning/core/mixins/device_dtype_mixin.py", line 128, in cuda
    device = torch.device("cuda", torch.cuda.current_device())
  File "/opt/homebrew/Caskroom/miniforge/base/lib/python3.10/site-packages/torch/cuda/__init__.py", line 487, in current_device
    _lazy_init()
  File "/opt/homebrew/Caskroom/miniforge/base/lib/python3.10/site-packages/torch/cuda/__init__.py", line 216, in _lazy_init
    raise AssertionError("Torch not compiled with CUDA enabled")
AssertionError: Torch not compiled with CUDA enabled

it seems to suggest i need cudatools, but you can't get cudatools on mac ??

HenkPoley commented 2 years ago

@nogira are you using Pull Request #47 ? e.g. gh pr checkout 47 (gives a PYTORCH_ENABLE_MPS_FALLBACK error, not fixed by export PYTORCH_ENABLE_MPS_FALLBACK=1)

Or https://github.com/Raymonf/stable-diffusion/tree/apple-silicon-mps-support ? e.g. gh repo clone Raymonf/stable-diffusion (give a 'Torch not compiled with CUDA enabled' error)

ghost commented 2 years ago

@HenkPoley oops, forgot to change to the m1 branch after cloning

HenkPoley commented 2 years ago

@nogira, does #47 work for you?

ghost commented 2 years ago

@HenkPoley i installed from scratch with conda env create -f environment-mac.yaml and now i'm getting the error No matching distribution found for opencv-python==4.1.2.30 to find a compatible version i did pip install opencv-python==4.1.2.30, then incremented the version until it didn't give the error (4.3.0.38), then i changed the version in environment-mac.yaml and installed from scratch again, but then that version gave the same error 🙃

HenkPoley commented 2 years ago

You can edit environment-mac.yaml to mention a version that does exist. (no clue why I can't link to the change with my comments)

Or have it say opencv-python>=some.version.number

ghost commented 2 years ago

finally got it to work by ditching the environment.yaml file again and just using conda/pip install directly

it seems to be generating images (very slow), but i still get the Some weights of the model checkpoint at openai/clip-vit-large-patch14 were not used when initializing CLIPTextModel warning with a giant dump of all the weights not used. is this normal? (is this related to "StableDiffusion is CPU-only on M1 Macs because not all the pytorch ops are implemented for Metal." ?)

HenkPoley commented 2 years ago

environment-mac.yaml is the one for Mac M1. Though it has some fairly old dependencies hardcoded. I'm sure it won't work with that.

ghost commented 2 years ago

environment-mac.yaml is the one for Mac M1. Though it has some fairly old dependencies hardcoded. I'm sure it won't work with that.

yeah thats the one i used but it didn't work

When I run this on Monterey 12.5.1, Python 3.8 consumes 18.70 GB at peak

im getting 15GB median memory usage

only 10GB if you use the arguments --n_samples 1 --n_rows 1

cgodley commented 2 years ago

Here are more details about the steps I took last night in case anyone else gets stuck.

Filesystem paths are probably different for you and I'm missing a few steps, but roughly:

ghost commented 2 years ago

running this gives me the exact same image twice (stacked vertically) in the same image file

python scripts/txt2img.py --prompt "a photograph of an astronaut riding a horse" --plms --n_samples 1

like this: 🤖 🤖

is it running the same computation on the same seed or something? how do i turn it off so my image file only contains 1 image?

cgodley commented 2 years ago

@nogira --n_iter 1

Yeah I was wondering the same thing, finally figured it out this morning

cgodley commented 2 years ago

I think --n_samples 1 actually does produce 2 different images (by default), but the grid repeats the first image. The original samples (both of them) should be in the outputs/txt2img-samples/samples/ folder.

cgodley commented 2 years ago

rows in the grid (default: n_samples)

I suspect it might be a bug in txt2img.py, it seems like n_rows should default to n_iter and not to n_samples