AUTOMATIC1111 / stable-diffusion-webui

Stable Diffusion web UI
GNU Affero General Public License v3.0
136.29k stars 25.97k forks source link

[Feature Request]: Support for new 2.0 models | 768x768 resolution + new 512x512 + depth + inpainting #5011

Closed AugmentedRealityCat closed 1 year ago

AugmentedRealityCat commented 1 year ago

Is there an existing issue for this?

What would your feature do ?

Support the new 768x768 model 2.0 from Stability-AI and all the other new models that just got released.

Proposed workflow

  1. Go to Stable Diffusion Checkpoint selector
  2. Select 768-v-ema.ckpt from the list
  3. Create images just like with any other model
  4. Extra ¨nice to have¨ option: set the resolution to 768x768 automatically when loading this model
  5. Add support for the new 512x512 models: base + inpainting + depth
  6. Add support for the new x4 upscaler model

Links

https://huggingface.co/stabilityai/stable-diffusion-2 https://huggingface.co/stabilityai/stable-diffusion-2-base https://huggingface.co/stabilityai/stable-diffusion-2-depth https://huggingface.co/stabilityai/stable-diffusion-2-inpainting https://huggingface.co/stabilityai/stable-diffusion-x4-upscaler/tree/main

768 model download link on HuggingFace: https://huggingface.co/stabilityai/stable-diffusion-2/blob/main/768-v-ema.ckpt 512 base model download link: https://huggingface.co/stabilityai/stable-diffusion-2-base/blob/main/512-base-ema.ckpt 512 depth model download link: https://huggingface.co/stabilityai/stable-diffusion-2-depth/blob/main/512-depth-ema.ckpt 512 inpainting model download link: https://huggingface.co/stabilityai/stable-diffusion-2-inpainting/blob/main/512-inpainting-ema.ckpt new x4 upscaler download link: https://huggingface.co/stabilityai/stable-diffusion-x4-upscaler/blob/main/x4-upscaler-ema.ckpt

Additional information

Here is the error message you get when trying to load the 768x768 2.0 model with the current release:

Traceback (most recent call last):
  File "C:\stable-diffusion-webui-master\venv\lib\site-packages\gradio\routes.py", line 284, in run_predict
    output = await app.blocks.process_api(
  File "C:\stable-diffusion-webui-master\venv\lib\site-packages\gradio\blocks.py", line 982, in process_api
    result = await self.call_function(fn_index, inputs, iterator)
  File "C:\stable-diffusion-webui-master\venv\lib\site-packages\gradio\blocks.py", line 824, in call_function
    prediction = await anyio.to_thread.run_sync(
  File "C:\stable-diffusion-webui-master\venv\lib\site-packages\anyio\to_thread.py", line 31, in run_sync
    return await get_asynclib().run_sync_in_worker_thread(
  File "C:\stable-diffusion-webui-master\venv\lib\site-packages\anyio\_backends\_asyncio.py", line 937, in run_sync_in_worker_thread
    return await future
  File "C:\stable-diffusion-webui-master\venv\lib\site-packages\anyio\_backends\_asyncio.py", line 867, in run
    result = context.run(func, *args)
  File "C:\stable-diffusion-webui-master\modules\ui.py", line 1662, in <lambda>
    fn=lambda value, k=k: run_settings_single(value, key=k),
  File "C:\stable-diffusion-webui-master\modules\ui.py", line 1504, in run_settings_single
    opts.data_labels[key].onchange()
  File "C:\stable-diffusion-webui-master\webui.py", line 41, in f
    res = func(*args, **kwargs)
  File "C:\stable-diffusion-webui-master\webui.py", line 83, in <lambda>
    shared.opts.onchange("sd_model_checkpoint", wrap_queued_call(lambda: modules.sd_models.reload_model_weights()))
  File "C:\stable-diffusion-webui-master\modules\sd_models.py", line 291, in reload_model_weights
    load_model_weights(sd_model, checkpoint_info)
  File "C:\stable-diffusion-webui-master\modules\sd_models.py", line 182, in load_model_weights
    model.load_state_dict(sd, strict=False)
  File "C:\stable-diffusion-webui-master\venv\lib\site-packages\torch\nn\modules\module.py", line 1604, in load_state_dict
    raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for LatentDiffusion:
        size mismatch for model.diffusion_model.input_blocks.1.1.proj_in.weight: copying a param with shape torch.Size([320, 320]) from checkpoint, the shape in current model is torch.Size([320, 320, 1, 1]).
        size mismatch for model.diffusion_model.input_blocks.1.1.transformer_blocks.0.attn2.to_k.weight: copying a param with shape torch.Size([320, 1024]) from checkpoint, the shape in current model is torch.Size([320, 768]).
        size mismatch for model.diffusion_model.input_blocks.1.1.transformer_blocks.0.attn2.to_v.weight: copying a param with shape torch.Size([320, 1024]) from checkpoint, the shape in current model is torch.Size([320, 768]).
        size mismatch for model.diffusion_model.input_blocks.1.1.proj_out.weight: copying a param with shape torch.Size([320, 320]) from checkpoint, the shape in current model is torch.Size([320, 320, 1, 1]).
        size mismatch for model.diffusion_model.input_blocks.2.1.proj_in.weight: copying a param with shape torch.Size([320, 320]) from checkpoint, the shape in current model is torch.Size([320, 320, 1, 1]).
        size mismatch for model.diffusion_model.input_blocks.2.1.transformer_blocks.0.attn2.to_k.weight: copying a param with shape torch.Size([320, 1024]) from checkpoint, the shape in current model is torch.Size([320, 768]).
        size mismatch for model.diffusion_model.input_blocks.2.1.transformer_blocks.0.attn2.to_v.weight: copying a param with shape torch.Size([320, 1024]) from checkpoint, the shape in current model is torch.Size([320, 768]).
        size mismatch for model.diffusion_model.input_blocks.2.1.proj_out.weight: copying a param with shape torch.Size([320, 320]) from checkpoint, the shape in current model is torch.Size([320, 320, 1, 1]).
        size mismatch for model.diffusion_model.input_blocks.4.1.proj_in.weight: copying a param with shape torch.Size([640, 640]) from checkpoint, the shape in current model is torch.Size([640, 640, 1, 1]).
        size mismatch for model.diffusion_model.input_blocks.4.1.transformer_blocks.0.attn2.to_k.weight: copying a param with shape torch.Size([640, 1024]) from checkpoint, the shape in current model is torch.Size([640, 768]).
        size mismatch for model.diffusion_model.input_blocks.4.1.transformer_blocks.0.attn2.to_v.weight: copying a param with shape torch.Size([640, 1024]) from checkpoint, the shape in current model is torch.Size([640, 768]).
        size mismatch for model.diffusion_model.input_blocks.4.1.proj_out.weight: copying a param with shape torch.Size([640, 640]) from checkpoint, the shape in current model is torch.Size([640, 640, 1, 1]).
        size mismatch for model.diffusion_model.input_blocks.5.1.proj_in.weight: copying a param with shape torch.Size([640, 640]) from checkpoint, the shape in current model is torch.Size([640, 640, 1, 1]).
        size mismatch for model.diffusion_model.input_blocks.5.1.transformer_blocks.0.attn2.to_k.weight: copying a param with shape torch.Size([640, 1024]) from checkpoint, the shape in current model is torch.Size([640, 768]).
        size mismatch for model.diffusion_model.input_blocks.5.1.transformer_blocks.0.attn2.to_v.weight: copying a param with shape torch.Size([640, 1024]) from checkpoint, the shape in current model is torch.Size([640, 768]).
        size mismatch for model.diffusion_model.input_blocks.5.1.proj_out.weight: copying a param with shape torch.Size([640, 640]) from checkpoint, the shape in current model is torch.Size([640, 640, 1, 1]).
        size mismatch for model.diffusion_model.input_blocks.7.1.proj_in.weight: copying a param with shape torch.Size([1280, 1280]) from checkpoint, the shape in current model is torch.Size([1280, 1280, 1, 1]).
        size mismatch for model.diffusion_model.input_blocks.7.1.transformer_blocks.0.attn2.to_k.weight: copying a param with shape torch.Size([1280, 1024]) from checkpoint, the shape in current model is torch.Size([1280, 768]).
        size mismatch for model.diffusion_model.input_blocks.7.1.transformer_blocks.0.attn2.to_v.weight: copying a param with shape torch.Size([1280, 1024]) from checkpoint, the shape in current model is torch.Size([1280, 768]).
        size mismatch for model.diffusion_model.input_blocks.7.1.proj_out.weight: copying a param with shape torch.Size([1280, 1280]) from checkpoint, the shape in current model is torch.Size([1280, 1280, 1, 1]).
        size mismatch for model.diffusion_model.input_blocks.8.1.proj_in.weight: copying a param with shape torch.Size([1280, 1280]) from checkpoint, the shape in current model is torch.Size([1280, 1280, 1, 1]).
        size mismatch for model.diffusion_model.input_blocks.8.1.transformer_blocks.0.attn2.to_k.weight: copying a param with shape torch.Size([1280, 1024]) from checkpoint, the shape in current model is torch.Size([1280, 768]).
        size mismatch for model.diffusion_model.input_blocks.8.1.transformer_blocks.0.attn2.to_v.weight: copying a param with shape torch.Size([1280, 1024]) from checkpoint, the shape in current model is torch.Size([1280, 768]).
        size mismatch for model.diffusion_model.input_blocks.8.1.proj_out.weight: copying a param with shape torch.Size([1280, 1280]) from checkpoint, the shape in current model is torch.Size([1280, 1280, 1, 1]).
        size mismatch for model.diffusion_model.middle_block.1.proj_in.weight: copying a param with shape torch.Size([1280, 1280]) from checkpoint, the shape in current model is torch.Size([1280, 1280, 1, 1]).
        size mismatch for model.diffusion_model.middle_block.1.transformer_blocks.0.attn2.to_k.weight: copying a param with shape torch.Size([1280, 1024]) from checkpoint, the shape in current model is torch.Size([1280, 768]).
        size mismatch for model.diffusion_model.middle_block.1.transformer_blocks.0.attn2.to_v.weight: copying a param with shape torch.Size([1280, 1024]) from checkpoint, the shape in current model is torch.Size([1280, 768]).
        size mismatch for model.diffusion_model.middle_block.1.proj_out.weight: copying a param with shape torch.Size([1280, 1280]) from checkpoint, the shape in current model is torch.Size([1280, 1280, 1, 1]).
        size mismatch for model.diffusion_model.output_blocks.3.1.proj_in.weight: copying a param with shape torch.Size([1280, 1280]) from checkpoint, the shape in current model is torch.Size([1280, 1280, 1, 1]).
        size mismatch for model.diffusion_model.output_blocks.3.1.transformer_blocks.0.attn2.to_k.weight: copying a param with shape torch.Size([1280, 1024]) from checkpoint, the shape in current model is torch.Size([1280, 768]).
        size mismatch for model.diffusion_model.output_blocks.3.1.transformer_blocks.0.attn2.to_v.weight: copying a param with shape torch.Size([1280, 1024]) from checkpoint, the shape in current model is torch.Size([1280, 768]).
        size mismatch for model.diffusion_model.output_blocks.3.1.proj_out.weight: copying a param with shape torch.Size([1280, 1280]) from checkpoint, the shape in current model is torch.Size([1280, 1280, 1, 1]).
        size mismatch for model.diffusion_model.output_blocks.4.1.proj_in.weight: copying a param with shape torch.Size([1280, 1280]) from checkpoint, the shape in current model is torch.Size([1280, 1280, 1, 1]).
        size mismatch for model.diffusion_model.output_blocks.4.1.transformer_blocks.0.attn2.to_k.weight: copying a param with shape torch.Size([1280, 1024]) from checkpoint, the shape in current model is torch.Size([1280, 768]).
        size mismatch for model.diffusion_model.output_blocks.4.1.transformer_blocks.0.attn2.to_v.weight: copying a param with shape torch.Size([1280, 1024]) from checkpoint, the shape in current model is torch.Size([1280, 768]).
        size mismatch for model.diffusion_model.output_blocks.4.1.proj_out.weight: copying a param with shape torch.Size([1280, 1280]) from checkpoint, the shape in current model is torch.Size([1280, 1280, 1, 1]).
        size mismatch for model.diffusion_model.output_blocks.5.1.proj_in.weight: copying a param with shape torch.Size([1280, 1280]) from checkpoint, the shape in current model is torch.Size([1280, 1280, 1, 1]).
        size mismatch for model.diffusion_model.output_blocks.5.1.transformer_blocks.0.attn2.to_k.weight: copying a param with shape torch.Size([1280, 1024]) from checkpoint, the shape in current model is torch.Size([1280, 768]).
        size mismatch for model.diffusion_model.output_blocks.5.1.transformer_blocks.0.attn2.to_v.weight: copying a param with shape torch.Size([1280, 1024]) from checkpoint, the shape in current model is torch.Size([1280, 768]).
        size mismatch for model.diffusion_model.output_blocks.5.1.proj_out.weight: copying a param with shape torch.Size([1280, 1280]) from checkpoint, the shape in current model is torch.Size([1280, 1280, 1, 1]).
        size mismatch for model.diffusion_model.output_blocks.6.1.proj_in.weight: copying a param with shape torch.Size([640, 640]) from checkpoint, the shape in current model is torch.Size([640, 640, 1, 1]).
        size mismatch for model.diffusion_model.output_blocks.6.1.transformer_blocks.0.attn2.to_k.weight: copying a param with shape torch.Size([640, 1024]) from checkpoint, the shape in current model is torch.Size([640, 768]).
        size mismatch for model.diffusion_model.output_blocks.6.1.transformer_blocks.0.attn2.to_v.weight: copying a param with shape torch.Size([640, 1024]) from checkpoint, the shape in current model is torch.Size([640, 768]).
        size mismatch for model.diffusion_model.output_blocks.6.1.proj_out.weight: copying a param with shape torch.Size([640, 640]) from checkpoint, the shape in current model is torch.Size([640, 640, 1, 1]).
        size mismatch for model.diffusion_model.output_blocks.7.1.proj_in.weight: copying a param with shape torch.Size([640, 640]) from checkpoint, the shape in current model is torch.Size([640, 640, 1, 1]).
        size mismatch for model.diffusion_model.output_blocks.7.1.transformer_blocks.0.attn2.to_k.weight: copying a param with shape torch.Size([640, 1024]) from checkpoint, the shape in current model is torch.Size([640, 768]).
        size mismatch for model.diffusion_model.output_blocks.7.1.transformer_blocks.0.attn2.to_v.weight: copying a param with shape torch.Size([640, 1024]) from checkpoint, the shape in current model is torch.Size([640, 768]).
        size mismatch for model.diffusion_model.output_blocks.7.1.proj_out.weight: copying a param with shape torch.Size([640, 640]) from checkpoint, the shape in current model is torch.Size([640, 640, 1, 1]).
        size mismatch for model.diffusion_model.output_blocks.8.1.proj_in.weight: copying a param with shape torch.Size([640, 640]) from checkpoint, the shape in current model is torch.Size([640, 640, 1, 1]).
        size mismatch for model.diffusion_model.output_blocks.8.1.transformer_blocks.0.attn2.to_k.weight: copying a param with shape torch.Size([640, 1024]) from checkpoint, the shape in current model is torch.Size([640, 768]).
        size mismatch for model.diffusion_model.output_blocks.8.1.transformer_blocks.0.attn2.to_v.weight: copying a param with shape torch.Size([640, 1024]) from checkpoint, the shape in current model is torch.Size([640, 768]).
        size mismatch for model.diffusion_model.output_blocks.8.1.proj_out.weight: copying a param with shape torch.Size([640, 640]) from checkpoint, the shape in current model is torch.Size([640, 640, 1, 1]).
        size mismatch for model.diffusion_model.output_blocks.9.1.proj_in.weight: copying a param with shape torch.Size([320, 320]) from checkpoint, the shape in current model is torch.Size([320, 320, 1, 1]).
        size mismatch for model.diffusion_model.output_blocks.9.1.transformer_blocks.0.attn2.to_k.weight: copying a param with shape torch.Size([320, 1024]) from checkpoint, the shape in current model is torch.Size([320, 768]).
        size mismatch for model.diffusion_model.output_blocks.9.1.transformer_blocks.0.attn2.to_v.weight: copying a param with shape torch.Size([320, 1024]) from checkpoint, the shape in current model is torch.Size([320, 768]).
        size mismatch for model.diffusion_model.output_blocks.9.1.proj_out.weight: copying a param with shape torch.Size([320, 320]) from checkpoint, the shape in current model is torch.Size([320, 320, 1, 1]).
        size mismatch for model.diffusion_model.output_blocks.10.1.proj_in.weight: copying a param with shape torch.Size([320, 320]) from checkpoint, the shape in current model is torch.Size([320, 320, 1, 1]).
        size mismatch for model.diffusion_model.output_blocks.10.1.transformer_blocks.0.attn2.to_k.weight: copying a param with shape torch.Size([320, 1024]) from checkpoint, the shape in current model is torch.Size([320, 768]).
        size mismatch for model.diffusion_model.output_blocks.10.1.transformer_blocks.0.attn2.to_v.weight: copying a param with shape torch.Size([320, 1024]) from checkpoint, the shape in current model is torch.Size([320, 768]).
        size mismatch for model.diffusion_model.output_blocks.10.1.proj_out.weight: copying a param with shape torch.Size([320, 320]) from checkpoint, the shape in current model is torch.Size([320, 320, 1, 1]).
        size mismatch for model.diffusion_model.output_blocks.11.1.proj_in.weight: copying a param with shape torch.Size([320, 320]) from checkpoint, the shape in current model is torch.Size([320, 320, 1, 1]).
        size mismatch for model.diffusion_model.output_blocks.11.1.transformer_blocks.0.attn2.to_k.weight: copying a param with shape torch.Size([320, 1024]) from checkpoint, the shape in current model is torch.Size([320, 768]).
        size mismatch for model.diffusion_model.output_blocks.11.1.transformer_blocks.0.attn2.to_v.weight: copying a param with shape torch.Size([320, 1024]) from checkpoint, the shape in current model is torch.Size([320, 768]).
        size mismatch for model.diffusion_model.output_blocks.11.1.proj_out.weight: copying a param with shape torch.Size([320, 320]) from checkpoint, the shape in current model is torch.Size([320, 320, 1, 1]).
1ort commented 1 year ago

To help those who are ready to take on: https://twitter.com/RiversHaveWings/status/1595596524431773697 https://github.com/crowsonkb/k-diffusion/commit/4314f9101a2f3bd7f11ba4290d2a7e2e64b4ceea As far as I understand, we only need to use this wrapper if we are working with 2.0 models

0xdevalias commented 1 year ago

Semi-Related:

nachoal commented 1 year ago

Trying to use the wrapper here but realized that the model loader is not even getting there, the model weights are still using the v1 512, 512 torch sizes and the new model has 4 dimensions

To help those who are ready to take on: https://twitter.com/RiversHaveWings/status/1595596524431773697 crowsonkb/k-diffusion@4314f91 As far as I understand, we only need to use this wrapper if we are working with 2.0 models

152334H commented 1 year ago

https://github.com/MrCheeze/stable-diffusion-webui/commit/069591b06bbbdb21624d489f3723b5f19468888d

Penagwin commented 1 year ago

Anyone on linux (and likely mac) that just want to try it, a few things I found:

I highly recommend cloning v2 to a new folder for the moment if you just want to try it!

git clone https://github.com/MrCheeze/stable-diffusion-webui.git stable-diffusion-v2
cd stable-diffusion-v2
git checkout sd-2.0 # I tested commit 069591b06bbbdb21624d489f3723b5f19468888d specifically

After setting up a venv, installing the requirements.txt, and placing the model into models/Stable-diffusion, I was able to launch with the following command:

STABLE_DIFFUSION_REPO=https://github.com/Stability-AI/stablediffusion  STABLE_DIFFUSION_COMMIT_HASH=33910c386eaba78b7247ce84f313de0f2c314f61 python launch.py --config repositories/stable-diffusion/configs/stable-diffusion/v2-inference-v.yaml

image

acheong08 commented 1 year ago

I get an error about AttributeError: 'FrozenOpenCLIPEmbedder' object has no attribute 'process_text' but it seems to be working anyway, I'm not sure exactly what that's about.

This causes my instance to stop working. How did you get it to proceed?

Edit: Resolved. Remove VRAM constraints

AugmentedRealityCat commented 1 year ago

I confirm it works over here as well ! I did not have to use the special launch command though (STABLE_DIFFUSION_REPO=https://github.com/Stability-AI/stablediffusion STABLE_DIFFUSION_COMMIT_HASH=33910c386eaba78b7247ce84f313de0f2c314f61 python launch.py --config repositories/stable-diffusion/configs/stable-diffusion/v2-inference-v.yaml) I simply used the webui-user.bat launcher and it worked the first time after installing all dependencies automatically.

acheong08 commented 1 year ago

I have got it working on Google Colab. As @Penagwin mentioned, it throws a few errors but still functions.

Note: Tick checkbox for SD1_5 rather than adding it in the Add models section

acheong08 commented 1 year ago

The way it processes the text seems to be broken AttributeError: 'FrozenOpenCLIPEmbedder' object has no attribute 'process_text'. My generations with the new models look ugly as hell

Penagwin commented 1 year ago

@AugmentedRealityCat The command is only needed for Linux and MacOs, the .bat should work for windows

@acheong08 Someone else should confirm this, but I believe this error is for getting the token count for displaying in the UI, which is why it's not actually required for generation. If this is right then I don't think it should affect the actual generated image. This is the line that calls the method that errors, and it's inside update_token_counter https://github.com/AUTOMATIC1111/stable-diffusion-webui/blob/828438b4a190759807f9054932cae3a8b880ddf1/modules/ui.py#L443

The first several prompts I tried were very... odd. I found 768x768 resolution made a huge difference. I also found starting a prompt from scratch might be a good idea too, just to learn the new prompting language.

I don't know for certain that it's not broken, but I was able to get a few images that I liked. I've had success with the new DPM++ SDE Karras, as well as Euler A. I am finding it a bit more difficult to get good images but I'm unsure if that's because of the v2 changes or if something is broken, or if my prompts are just bad.

Some that I liked:

00037-3034038969-masterpiece,  detailed, dreaming of electric (penguins _4), scifi, concept art, (surreal), galaxy background, sharp, fractals _1

masterpiece, detailed, dreaming of electric (penguins :4), scifi, concept art, (surreal), galaxy background, sharp,[fractals :1.8], [recursion :1.8] Negative prompt: blurry Steps: 5, Sampler: DPM++ SDE Karras, CFG scale: 8, Seed: 3034038969, Size: 768x768, Model hash: 2c02b20a, Eta: 0.06

00042-670988386-masterpiece,  extremely detailed, dreaming of (electric) (penguins _4), scifi, concept art, (surreal), moon,  galaxy background,

masterpiece, extremely detailed, dreaming of (electric) (penguins :4), scifi, concept art, (surreal), moon, galaxy background, sharp,[fractals :1.8], [recursion :1.8] Negative prompt: blurry Steps: 5, Sampler: DPM++ SDE Karras, CFG scale: 9, Seed: 670988386, Size: 768x768, Model hash: 2c02b20a, Eta: 0.06

00067-4235446037-masterpiece,  extremely detailed, dreaming of (electric) (penguins _2), scifi, digital concept art, (surreal), moon,  galaxy bac

masterpiece, extremely detailed, dreaming of (electric) (penguins :2), scifi, digital concept art, (surreal), moon, galaxy background, supernova, dramatic, sharp,[fractals :1.4], [recursion :1.8] Negative prompt: blurry, painting, drawing Steps: 15, Sampler: DPM++ SDE Karras, CFG scale: 13.5, Seed: 4235446037, Size: 768x768, Model hash: 2c02b20a, Eta: 0.06

0xdevalias commented 1 year ago

On it!

Originally posted by @TheLastBen in https://github.com/TheLastBen/fast-stable-diffusion/issues/599#issuecomment-1326063269

acheong08 commented 1 year ago

@Penagwin It seems it was bad prompting. Their new prompt system messed it up for me. Trying a few times gets me much better results

acheong08 commented 1 year ago

NSFW has been completely wrecked. It was bad on 1.5 but now it's almost impossible to get anything aesthetic. It feels like Midjourney.

They succeeded at their goal.

YakuzaSuske commented 1 year ago

NO NSFW?? 1631502011600-790abb1f11c0bd7ac1e89fedf13d112caa25608c05061757d057e6c67196c935

acheong08 commented 1 year ago

NO NSFW??

All attempts seem to make it black and white with severely deformed limbs . The samples that make it past their filters seem to be low quality images and abstract art

image

image

Daviljoe193 commented 1 year ago

NO NSFW??

All attempts seem to make it black and white with severely deformed limbs . The samples that make it past their filters seem to be low quality images and abstract art

image

image

Try 768x768, since that's what the model's trained for. Doing what you're doing is like telling 1.5 to work at 256x256, it ain't good at resolutions lower than what it was meant for.

acheong08 commented 1 year ago

image

At 768x768

Giving up on NSFW

Daviljoe193 commented 1 year ago

@acheong08 Prompt? Kinda looks like what I got using "woman spread naked, on the beach, fullbody, nude, top-down" in 1.5. download (3) Stock model was never too good at NSFW anyway, too many gross mangled people. It did get more normal looking results after the first attempt with this prompt, though I can't tell if posting NSFW here is against Github TOS, so I'll refrain from posting those.

imacopypaster commented 1 year ago

posting NSFW here is against Github TOS.

It's not porn, it's not even erotica. and not even naturalistic content at all. this is bodyhorror

acheong08 commented 1 year ago

posting NSFW here is against Github TOS.

It's not porn, it's not even erotica. and not even naturalistic content at all. this is bodyhoror

I'll refrain from posting any more here. All my attempts with 2.0 has been horrific.

Prompt?

I just copy pasted 1.5 prompts that got me good results previously. I'll ask around in Discord. GitHub is not meant for such discussions

ClashSAN commented 1 year ago

Do you want to hide your ugly

![hands](https://user-images.githubusercontent.com/98228077/203744212-71583e92-ab6a-4621-8678-060ab63cfb09.png) ![hands2](https://user-images.githubusercontent.com/98228077/203744213-f519d4d3-82b0-49be-a656-229d36e172b1.png) ![hands3](https://user-images.githubusercontent.com/98228077/203744215-f2191862-eb83-41dc-8b4f-75abd62f51b8.png) ![hands4](https://user-images.githubusercontent.com/98228077/203744217-532d9d21-88f7-4b22-98cd-f1591ce7b90a.png)
Daviljoe193 commented 1 year ago

Took some fighting to get the 2.0 model to work within the free teir of Colab (Kept ^Cing on me), but after restarting, it had just enough ram free to run the GUI. Running the prompt again, with the same sampler and step count (Steps: 50, Sampler: DPM++ 2M Karras), but with the addition of some naughty bits (Boobs, breasts, vagina, please Github don't kill me), I did get the usual stock results, somewhere about on par with what 1.5 was capable of.

boobs and stuff

![grid-0001](https://user-images.githubusercontent.com/67191631/203749564-945e5222-34fb-4cd5-a212-b12d8b6eda3a.jpg)

To say the results are good would be a complete lie, but again, they are about what you'd expect from the stock 1.5 model. What's a bit weird is that there seems to be some strange alignment issues. When it isn't mangled, it's off center.

all4five commented 1 year ago

Kept ^Cing on me)

@Daviljoe193

How did you solve this? I'm also getting ^C with a paid plan...

Daviljoe193 commented 1 year ago

Kept ^Cing on me)

How did you solve this? I'm also getting ^C with a paid plan...

Restart the session after running everything just up to the ^Cing cell, then re-run that cell again, changing nothing. It's stupid, but that's how it is.

all4five commented 1 year ago

Thanks it worked!

0xdevalias commented 1 year ago

NSFW has been completely wrecked. It was bad on 1.5 but now it's almost impossible to get anything aesthetic.

According to the model card they HEAVILY filtered the training data before training the model (threshold of 0.1, where 1.0 is considered fully NSFW), so it's not just a filter tacked on at the end like last time.

Training Data The model developers used the following dataset for training the model:

  • LAION-5B and subsets (details below). The training data is further filtered using LAION's NSFW detector, with a "p_unsafe" score of 0.1 (conservative). For more details, please refer to LAION-5B's NeurIPS 2022 paper and reviewer discussions on the topic.

We currently provide the following checkpoints:

  • 512-base-ema.ckpt: 550k steps at resolution 256x256 on a subset of LAION-5B filtered for explicit pornographic material, using the LAION-NSFW classifier with punsafe=0.1 and an aesthetic score >= 4.5. 850k steps at resolution 512x512 on the same dataset with resolution >= 512x512.

That said, I would assume that that would also mean that anyone who gathered a sufficient training dataset could probably finetune/dreambooth the concept back into the model.

acheong08 commented 1 year ago

they are about what you'd expect from the stock 1.5 model.

I get much better results from 1.5 on average (for nsfw). No deformations with the correct negative prompts

acheong08 commented 1 year ago

That said, I would assume that that would also mean that anyone who gathered a sufficient training dataset could probably finetune/dreambooth the concept back into the model.

The dataset is already publicly available. The issue is computational power.

manugarri commented 1 year ago

this GH issue is like a chat right now lol

acheong08 commented 1 year ago

this GH issue is like a chat right now lol

Discord for devs

acheong08 commented 1 year ago

Speaking of computational power, is distributed training of Stable Diffusion possible across botnets possible?

0xdevalias commented 1 year ago

https://github.com/hafriedlander/diffusers/blob/stable_diffusion_2/scripts/convert_original_stable_diffusion_to_diffusers.py

Notes:

  • Only tested on the two txt2img models, not inpaint / depth2img / upscaling
  • You will need to change your text embedding to use the penultimate layer too
  • It spits out a bunch of warnings about vision_model, but that's fine
  • I have no idea if this is right or not. It generates images, no guarantee beyond that. (Hence no PR - if you're patient, I'm sure the Diffusers team will do a better job than I have)

Originally posted by @hafriedlander in https://github.com/huggingface/diffusers/issues/1388#issuecomment-1326135768


Here's an example of accessing the penultimate text embedding layer https://github.com/hafriedlander/stable-diffusion-grpcserver/blob/b34bb27cf30940f6a6a41f4b77c5b77bea11fd76/sdgrpcserver/pipeline/text_embedding/basic_text_embedding.py#L33

Originally posted by @hafriedlander in https://github.com/huggingface/diffusers/issues/1388#issuecomment-1326166368


doesn't seem to work for me on the 768-v model using the v2 config for v

TypeError: EulerDiscreteScheduler.init() got an unexpected keyword argument 'prediction_type'

Originally posted by @devilismyfriend in https://github.com/huggingface/diffusers/issues/1388#issuecomment-1326220609


You need to use the absolute latest Diffusers and merge this PR (or use my branch which has it in it) https://github.com/huggingface/diffusers/pull/1386

Originally posted by @hafriedlander in https://github.com/huggingface/diffusers/issues/1388#issuecomment-1326243809


(My branch is at https://github.com/hafriedlander/diffusers/tree/stable_diffusion_2)

Originally posted by @hafriedlander in https://github.com/huggingface/diffusers/issues/1388#issuecomment-1326245339

0xdevalias commented 1 year ago

Speaking of computational power, is distributed training of Stable Diffusion possible across botnets possible?

While I haven't personally done anything like that myself, a lot of the things like deepspeed, ColossalAI, etc are basically designed for doing distributed training, and there are other things like StableHorde, etc for more adhoc/distributed stuff.

So, tl;dr: almost certainly.

ghost commented 1 year ago

Reading through the thread half awake and making the following assumptions: 2.0 is possible in automatic1111? But 2.0 is censored into oblivion if you want to have a female form (not talking porn but your standard sexy character d&d art)? 2.0 prompt crafting is completely different or broken in automatic1111? NSFW stuff is impossible to add by the community in 2.0 unless they have a supercomputer?

Have I skimmed through the issues correctly?

Daviljoe193 commented 1 year ago

Reading through the thread half awake and making the following assumptions: 2.0 is possible in automatic1111? But 2.0 is censored into oblivion if you want to have a female form (not talking porn but your standard sexy character d&d art)? 2.0 prompt crafting is completely different or broken in automatic1111? NSFW stuff is impossible to add by the community in 2.0 unless they have a supercomputer?

Have I skimmed through the issues correctly?

Yup, for the most part. I'm not 100% sure if model training resumed from one of the previous uncensored checkpoints, or if it was restarted from scratch, as my post from before DID have some naughty bits, despite those having been filtered out of the current dataset.

0xdevalias commented 1 year ago

2.0 is possible in automatic1111?

Yup, should be.

But 2.0 is censored into oblivion if you want to have a female form (not talking porn but your standard sexy character d&d art)?

Sort of depends on how you define 'censored into oblivion' I suppose..

2.0 prompt crafting is completely different or broken in automatic1111?

I haven't played with this myself, but I did get that impression. SD v2.0 is trained using a completely different language model than v1.4/v1.5, so it would make sense to me that the exact same prompts as earlier aren't going to work the exact same way anymore.

NSFW stuff is impossible to add by the community in 2.0 unless they have a supercomputer?

I don't think that's true.

Have I skimmed through the issues correctly?

Seemingly more or less :)

0xdevalias commented 1 year ago

I'm not 100% sure if model training resumed from one of the previous uncensored checkpoints, or if it was restarted from scratch

https://github.com/Stability-AI/stablediffusion/blob/main/modelcard.md

We currently provide the following checkpoints:

  • 512-base-ema.ckpt: 550k steps at resolution 256x256 on a subset of LAION-5B filtered for explicit pornographic material, using the LAION-NSFW classifier with punsafe=0.1 and an aesthetic score >= 4.5. 850k steps at resolution 512x512 on the same dataset with resolution >= 512x512.

While I don't know this for certain, that very much reads to me as though it was trained from scratch. For previous model releases they have been pretty explicit in mentioning when it was resumed from a previous checkpoint. I believe given they're using a completely different text encoder that they probably wouldn't have been able to resume from a v1.4 model either.

C43H66N12O12S2 commented 1 year ago

Apologies to everybody waiting on support for v2. I am still extremely busy with my midterms and unable to work on this in any capacity.

Remarks derived from an initial glance:

My xformers code is most likely rendered obsolete by the new native support, though my wheels are still necessary. It seems like the current CLIP hijack - and any related features - will fail as is. We'd need to ship v1-inference.yaml and modify the default config to support both older v1 models (like the huge library of dreambooth models) and v2.

Aside from those, simply switching the repo to the StabilityAI one should allow loading. Actual, proper inference would likely require updating samplers at least.

0xdevalias commented 1 year ago

testing in progress on the horde https://github.com/Sygil-Dev/nataili/tree/v2 try it out Stable Diffusion 2.0 on our UI's

https://tinybots.net/artbot https://aqualxx.github.io/stable-ui/ https://dbzer0.itch.io/lucid-creations

https://sigmoid.social/@stablehorde/109398715339480426

SD 2.0

  • [x] Initial implementation ready for testing
  • [ ] img2img
  • [ ] inpainting
  • [ ] k_diffusers support

Originally posted by @AlRlC in https://github.com/Sygil-Dev/nataili/issues/67#issuecomment-1326385645

0xdevalias commented 1 year ago

Originally posted by @0xdevalias in https://github.com/TheLastBen/fast-stable-diffusion/issues/599#issuecomment-1326446674

0xdevalias commented 1 year ago

Should work now, make sure you check the box "redownload original model" when choosing V2

https://colab.research.google.com/github/TheLastBen/fast-stable-diffusion/blob/main/fast_stable_diffusion_AUTOMATIC1111.ipynb

Requires more than 12GB of RAM for now, so free colab probably won't suffice.

Originally posted by @TheLastBen in https://github.com/TheLastBen/fast-stable-diffusion/issues/599#issuecomment-1326461962

uservar commented 1 year ago

Managed to get it working on free google colab with my fork of the automatic1111 repo which is compatible with both base 512 and V 768 (if you enable v-prediction checkbox), and also with old models if you don't specify a --config parameter (though ddim and plms sampling seem to be broken with the new Stability-AI/stablediffusion repo)

https://colab.research.google.com/drive/1ayH6PUri-vvTXhaoL3NEZr_iVvv2qosR

RaouleMenard commented 1 year ago

Hi, is it possible to just use the new "depth2img" feature in old models? This is the only improvement I'm interested in.

krisfail commented 1 year ago

Hi, is it possible to just use the new "depth2img" feature in old models? This is the only improvement I'm interested in. @RaouleMenard

There is a model in v2 that can accept depth information generated by another model, but the v1 model does not have such a feature, so it seems difficult. It would be possible to generate a mask from depth information, but it would be inherently different from v2's.

kakaxixx commented 1 year ago

2.0 model is not good at all, rubbish work!

ghost commented 1 year ago

while that seems to be the consensus given the removal of art styles...it would still be nice to use it in automatic1111 so we can make our own minds up.... all about choice dear fellow

toriato commented 1 year ago

2022-11-25_00-53-04_033f

A picture of a cat with little (orange:1.5), black, and white fur
Negative prompt: blurry
Steps: 14, Sampler: DPM++ stochastic, CFG scale: 7, Seed: 2157866423, Size: 1024x768, Model hash: 2c02b20a
code ```diff diff --git a/modules/sd_hijack.py b/modules/sd_hijack.py index eaedac1..67eac6c 100644 --- a/modules/sd_hijack.py +++ b/modules/sd_hijack.py @@ -4,6 +4,7 @@ import sys import traceback import torch import numpy as np +import open_clip from torch import einsum from torch.nn.functional import silu @@ -70,9 +71,8 @@ class StableDiffusionModelHijack: embedding_db = modules.textual_inversion.textual_inversion.EmbeddingDatabase(cmd_opts.embeddings_dir) def hijack(self, m): - model_embeddings = m.cond_stage_model.transformer.text_model.embeddings - - model_embeddings.token_embedding = EmbeddingsWithFixes(model_embeddings.token_embedding, self) + model = m.cond_stage_model.model + model.token_embedding = EmbeddingsWithFixes(model.token_embedding, self) m.cond_stage_model = FrozenCLIPEmbedderWithCustomWords(m.cond_stage_model, self) self.clip = m.cond_stage_model @@ -92,9 +92,10 @@ class StableDiffusionModelHijack: if type(m.cond_stage_model) == FrozenCLIPEmbedderWithCustomWords: m.cond_stage_model = m.cond_stage_model.wrapped - model_embeddings = m.cond_stage_model.transformer.text_model.embeddings - if type(model_embeddings.token_embedding) == EmbeddingsWithFixes: - model_embeddings.token_embedding = model_embeddings.token_embedding.wrapped + model = m.cond_stage_model.model + + if type(model.token_embedding) == EmbeddingsWithFixes: + model.token_embedding = model.token_embedding.wrapped self.apply_circular(False) self.layers = None @@ -122,12 +123,15 @@ class FrozenCLIPEmbedderWithCustomWords(torch.nn.Module): super().__init__() self.wrapped = wrapped self.hijack: StableDiffusionModelHijack = hijack - self.tokenizer = wrapped.tokenizer + self.tokenizer = open_clip.tokenizer._tokenizer # seems wrong self.token_mults = {} + + self.id_sot = self.tokenizer.encoder[''] + self.id_eot = self.tokenizer.encoder[''] - self.comma_token = [v for k, v in self.tokenizer.get_vocab().items() if k == ','][0] + self.comma_token = [v for k, v in self.tokenizer.encoder.items() if k == ','][0] - tokens_with_parens = [(k, v) for k, v in self.tokenizer.get_vocab().items() if '(' in k or ')' in k or '[' in k or ']' in k] + tokens_with_parens = [(k, v) for k, v in self.tokenizer.encoder.items() if '(' in k or ')' in k or '[' in k or ']' in k] for text, ident in tokens_with_parens: mult = 1.0 for c in text: @@ -144,14 +148,12 @@ class FrozenCLIPEmbedderWithCustomWords(torch.nn.Module): self.token_mults[ident] = mult def tokenize_line(self, line, used_custom_terms, hijack_comments): - id_end = self.wrapped.tokenizer.eos_token_id - if opts.enable_emphasis: parsed = prompt_parser.parse_prompt_attention(line) else: parsed = [[line, 1.0]] - tokenized = self.wrapped.tokenizer([text for text, _ in parsed], truncation=False, add_special_tokens=False)["input_ids"] + tokenized = list(map(self.tokenizer.encode, [text for text, _ in parsed])) fixes = [] remade_tokens = [] @@ -176,7 +178,7 @@ class FrozenCLIPEmbedderWithCustomWords(torch.nn.Module): length = len(remade_tokens) rem = int(math.ceil(length / 75)) * 75 - length - remade_tokens += [id_end] * rem + reloc_tokens + remade_tokens += [self.id_eot] * rem + reloc_tokens multipliers = multipliers[:last_comma] + [1.0] * rem + reloc_mults if embedding is None: @@ -188,7 +190,7 @@ class FrozenCLIPEmbedderWithCustomWords(torch.nn.Module): iteration = len(remade_tokens) // 75 if (len(remade_tokens) + emb_len) // 75 != iteration: rem = (75 * (iteration + 1) - len(remade_tokens)) - remade_tokens += [id_end] * rem + remade_tokens += [self.id_eot] * rem multipliers += [1.0] * rem iteration += 1 fixes.append((iteration, (len(remade_tokens) % 75, embedding))) @@ -201,7 +203,7 @@ class FrozenCLIPEmbedderWithCustomWords(torch.nn.Module): prompt_target_length = get_target_prompt_token_count(token_count) tokens_to_add = prompt_target_length - len(remade_tokens) - remade_tokens = remade_tokens + [id_end] * tokens_to_add + remade_tokens = remade_tokens + [self.id_eot] * tokens_to_add multipliers = multipliers + [1.0] * tokens_to_add return remade_tokens, fixes, multipliers, token_count @@ -348,17 +350,17 @@ class FrozenCLIPEmbedderWithCustomWords(torch.nn.Module): def process_tokens(self, remade_batch_tokens, batch_multipliers): if not opts.use_old_emphasis_implementation: - remade_batch_tokens = [[self.wrapped.tokenizer.bos_token_id] + x[:75] + [self.wrapped.tokenizer.eos_token_id] for x in remade_batch_tokens] + remade_batch_tokens = [[self.id_sot] + x[:75] + [self.id_eot] for x in remade_batch_tokens] batch_multipliers = [[1.0] + x[:75] + [1.0] for x in batch_multipliers] tokens = torch.asarray(remade_batch_tokens).to(device) - outputs = self.wrapped.transformer(input_ids=tokens, output_hidden_states=-opts.CLIP_stop_at_last_layers) + z = self.wrapped.encode_with_transformer(tokens) - if opts.CLIP_stop_at_last_layers > 1: - z = outputs.hidden_states[-opts.CLIP_stop_at_last_layers] - z = self.wrapped.transformer.text_model.final_layer_norm(z) - else: - z = outputs.last_hidden_state + # if opts.CLIP_stop_at_last_layers > 1: + # z = outputs.hidden_states[-opts.CLIP_stop_at_last_layers] + # z = self.wrapped.transformer.text_model.final_layer_norm(z) + # else: + # z = outputs.last_hidden_state # restoring original mean is likely not correct, but it seems to work well to prevent artifacts that happen otherwise batch_multipliers_of_same_length = [x + [1.0] * (75 - len(x)) for x in batch_multipliers] diff --git a/modules/sd_samplers.py b/modules/sd_samplers.py index 4fe6785..fd5e092 100644 --- a/modules/sd_samplers.py +++ b/modules/sd_samplers.py @@ -33,6 +33,7 @@ samplers_k_diffusion = [ ('DPM2 a Karras', 'sample_dpm_2_ancestral', ['k_dpm_2_a_ka'], {'scheduler': 'karras'}), ('DPM++ 2S a Karras', 'sample_dpmpp_2s_ancestral', ['k_dpmpp_2s_a_ka'], {'scheduler': 'karras'}), ('DPM++ 2M Karras', 'sample_dpmpp_2m', ['k_dpmpp_2m_ka'], {'scheduler': 'karras'}), + ('DPM++ stochastic', 'sample_dpmpp_sde', ['k_dpmpp_sde'], {}) ] samplers_data_k_diffusion = [ @@ -350,7 +351,7 @@ class TorchHijack: class KDiffusionSampler: def __init__(self, funcname, sd_model): - self.model_wrap = k_diffusion.external.CompVisDenoiser(sd_model, quantize=shared.opts.enable_quantization) + self.model_wrap = k_diffusion.external.CompVisVDenoiser(sd_model, quantize=shared.opts.enable_quantization) self.funcname = funcname self.func = getattr(k_diffusion.sampling, self.funcname) self.extra_params = sampler_extra_params.get(funcname, []) ```

fixed attention and emphasis part not sure how to implement CLIP stop layers feature...

ghost commented 1 year ago

2022-11-25_00-53-04_033f

A picture of a cat with little (orange:1.5), black, and white fur
Negative prompt: blurry
Steps: 14, Sampler: DPM++ stochastic, CFG scale: 7, Seed: 2157866423, Size: 1024x768, Model hash: 2c02b20a

code fixed attention and emphasis part not sure how to implement CLIP stop layers feature...

thats very accurate! hmmm duplicating automatic1111 now will update one copy in a bit

ProGamerGov commented 1 year ago

The new Stability AI GitHub repo appears to be located here now: https://github.com/Stability-AI/stablediffusion

ghost commented 1 year ago

2022-11-25_00-53-04_033f

A picture of a cat with little (orange:1.5), black, and white fur
Negative prompt: blurry
Steps: 14, Sampler: DPM++ stochastic, CFG scale: 7, Seed: 2157866423, Size: 1024x768, Model hash: 2c02b20a

code fixed attention and emphasis part not sure how to implement CLIP stop layers feature...

How did you get that to work in automatic1111 ?

Loading weights [2c02b20a] from D:\AIArt\NewVersion\SD\models\Stable-diffusion\768-v-ema.ckpt
Global Step: 140000
Traceback (most recent call last):
  File "D:\AIArt\NewVersion\SD\venv\lib\site-packages\gradio\routes.py", line 284, in run_predict
    output = await app.blocks.process_api(
  File "D:\AIArt\NewVersion\SD\venv\lib\site-packages\gradio\blocks.py", line 982, in process_api
    result = await self.call_function(fn_index, inputs, iterator)
  File "D:\AIArt\NewVersion\SD\venv\lib\site-packages\gradio\blocks.py", line 824, in call_function
    prediction = await anyio.to_thread.run_sync(
  File "D:\AIArt\NewVersion\SD\venv\lib\site-packages\anyio\to_thread.py", line 31, in run_sync
    return await get_asynclib().run_sync_in_worker_thread(
  File "D:\AIArt\NewVersion\SD\venv\lib\site-packages\anyio\_backends\_asyncio.py", line 937, in run_sync_in_worker_thread
    return await future
  File "D:\AIArt\NewVersion\SD\venv\lib\site-packages\anyio\_backends\_asyncio.py", line 867, in run
    result = context.run(func, *args)
  File "D:\AIArt\NewVersion\SD\modules\ui.py", line 1664, in <lambda>
    fn=lambda value, k=k: run_settings_single(value, key=k),
  File "D:\AIArt\NewVersion\SD\modules\ui.py", line 1505, in run_settings_single
    if not opts.set(key, value):
  File "D:\AIArt\NewVersion\SD\modules\shared.py", line 454, in set
    self.data_labels[key].onchange()
  File "D:\AIArt\NewVersion\SD\webui.py", line 44, in f
    res = func(*args, **kwargs)
  File "D:\AIArt\NewVersion\SD\webui.py", line 86, in <lambda>
    shared.opts.onchange("sd_model_checkpoint", wrap_queued_call(lambda: modules.sd_models.reload_model_weights()))
  File "D:\AIArt\NewVersion\SD\modules\sd_models.py", line 289, in reload_model_weights
    load_model_weights(sd_model, checkpoint_info)
  File "D:\AIArt\NewVersion\SD\modules\sd_models.py", line 182, in load_model_weights
    model.load_state_dict(sd, strict=False)
  File "D:\AIArt\NewVersion\SD\venv\lib\site-packages\torch\nn\modules\module.py", line 1604, in load_state_dict
    raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for LatentDiffusion:
        size mismatch for model.diffusion_model.input_blocks.1.1.proj_in.weight: copying a param with shape torch.Size([320, 320]) from checkpoint, the shape in current model is torch.Size([320, 320, 1, 1]).
        size mismatch for model.diffusion_model.input_blocks.1.1.transformer_blocks.0.attn2.to_k.weight: copying a param with shape torch.Size([320, 1024]) from checkpoint, the shape in current model is torch.Size([320, 768]).
        size mismatch for model.diffusion_model.input_blocks.1.1.transformer_blocks.0.attn2.to_v.weight: copying a param with shape torch.Size([320, 1024]) from checkpoint, the shape in current model is torch.Size([320, 768]).
        size mismatch for model.diffusion_model.input_blocks.1.1.proj_out.weight: copying a param with shape torch.Size([320, 320]) from checkpoint, the shape in current model is torch.Size([320, 320, 1, 1]).
        size mismatch for model.diffusion_model.input_blocks.2.1.proj_in.weight: copying a param with shape torch.Size([320, 320]) from checkpoint, the shape in current model is torch.Size([320, 320, 1, 1]).
        size mismatch for model.diffusion_model.input_blocks.2.1.transformer_blocks.0.attn2.to_k.weight: copying a param with shape torch.Size([320, 1024]) from checkpoint, the shape in current model is torch.Size([320, 768]).
        size mismatch for model.diffusion_model.input_blocks.2.1.transformer_blocks.0.attn2.to_v.weight: copying a param with shape torch.Size([320, 1024]) from checkpoint, the shape in current model is torch.Size([320, 768]).
        size mismatch for model.diffusion_model.input_blocks.2.1.proj_out.weight: copying a param with shape torch.Size([320, 320]) from checkpoint, the shape in current model is torch.Size([320, 320, 1, 1]).
        size mismatch for model.diffusion_model.input_blocks.4.1.proj_in.weight: copying a param with shape torch.Size([640, 640]) from checkpoint, the shape in current model is torch.Size([640, 640, 1, 1]).
        size mismatch for model.diffusion_model.input_blocks.4.1.transformer_blocks.0.attn2.to_k.weight: copying a param with shape torch.Size([640, 1024]) from checkpoint, the shape in current model is torch.Size([640, 768]).
        size mismatch for model.diffusion_model.input_blocks.4.1.transformer_blocks.0.attn2.to_v.weight: copying a param with shape torch.Size([640, 1024]) from checkpoint, the shape in current model is torch.Size([640, 768]).
        size mismatch for model.diffusion_model.input_blocks.4.1.proj_out.weight: copying a param with shape torch.Size([640, 640]) from checkpoint, the shape in current model is torch.Size([640, 640, 1, 1]).
        size mismatch for model.diffusion_model.input_blocks.5.1.proj_in.weight: copying a param with shape torch.Size([640, 640]) from checkpoint, the shape in current model is torch.Size([640, 640, 1, 1]).
        size mismatch for model.diffusion_model.input_blocks.5.1.transformer_blocks.0.attn2.to_k.weight: copying a param with shape torch.Size([640, 1024]) from checkpoint, the shape in current model is torch.Size([640, 768]).
        size mismatch for model.diffusion_model.input_blocks.5.1.transformer_blocks.0.attn2.to_v.weight: copying a param with shape torch.Size([640, 1024]) from checkpoint, the shape in current model is torch.Size([640, 768]).
        size mismatch for model.diffusion_model.input_blocks.5.1.proj_out.weight: copying a param with shape torch.Size([640, 640]) from checkpoint, the shape in current model is torch.Size([640, 640, 1, 1]).
        size mismatch for model.diffusion_model.input_blocks.7.1.proj_in.weight: copying a param with shape torch.Size([1280, 1280]) from checkpoint, the shape in current model is torch.Size([1280, 1280, 1, 1]).
        size mismatch for model.diffusion_model.input_blocks.7.1.transformer_blocks.0.attn2.to_k.weight: copying a param with shape torch.Size([1280, 1024]) from checkpoint, the shape in current model is torch.Size([1280, 768]).
        size mismatch for model.diffusion_model.input_blocks.7.1.transformer_blocks.0.attn2.to_v.weight: copying a param with shape torch.Size([1280, 1024]) from checkpoint, the shape in current model is torch.Size([1280, 768]).
        size mismatch for model.diffusion_model.input_blocks.7.1.proj_out.weight: copying a param with shape torch.Size([1280, 1280]) from checkpoint, the shape in current model is torch.Size([1280, 1280, 1, 1]).
        size mismatch for model.diffusion_model.input_blocks.8.1.proj_in.weight: copying a param with shape torch.Size([1280, 1280]) from checkpoint, the shape in current model is torch.Size([1280, 1280, 1, 1]).
        size mismatch for model.diffusion_model.input_blocks.8.1.transformer_blocks.0.attn2.to_k.weight: copying a param with shape torch.Size([1280, 1024]) from checkpoint, the shape in current model is torch.Size([1280, 768]).
        size mismatch for model.diffusion_model.input_blocks.8.1.transformer_blocks.0.attn2.to_v.weight: copying a param with shape torch.Size([1280, 1024]) from checkpoint, the shape in current model is torch.Size([1280, 768]).
        size mismatch for model.diffusion_model.input_blocks.8.1.proj_out.weight: copying a param with shape torch.Size([1280, 1280]) from checkpoint, the shape in current model is torch.Size([1280, 1280, 1, 1]).
        size mismatch for model.diffusion_model.middle_block.1.proj_in.weight: copying a param with shape torch.Size([1280, 1280]) from checkpoint, the shape in current model is torch.Size([1280, 1280, 1, 1]).
        size mismatch for model.diffusion_model.middle_block.1.transformer_blocks.0.attn2.to_k.weight: copying a param with shape torch.Size([1280, 1024]) from checkpoint, the shape in current model is torch.Size([1280, 768]).
        size mismatch for model.diffusion_model.middle_block.1.transformer_blocks.0.attn2.to_v.weight: copying a param with shape torch.Size([1280, 1024]) from checkpoint, the shape in current model is torch.Size([1280, 768]).
        size mismatch for model.diffusion_model.middle_block.1.proj_out.weight: copying a param with shape torch.Size([1280, 1280]) from checkpoint, the shape in current model is torch.Size([1280, 1280, 1, 1]).
        size mismatch for model.diffusion_model.output_blocks.3.1.proj_in.weight: copying a param with shape torch.Size([1280, 1280]) from checkpoint, the shape in current model is torch.Size([1280, 1280, 1, 1]).
        size mismatch for model.diffusion_model.output_blocks.3.1.transformer_blocks.0.attn2.to_k.weight: copying a param with shape torch.Size([1280, 1024]) from checkpoint, the shape in current model is torch.Size([1280, 768]).
        size mismatch for model.diffusion_model.output_blocks.3.1.transformer_blocks.0.attn2.to_v.weight: copying a param with shape torch.Size([1280, 1024]) from checkpoint, the shape in current model is torch.Size([1280, 768]).
        size mismatch for model.diffusion_model.output_blocks.3.1.proj_out.weight: copying a param with shape torch.Size([1280, 1280]) from checkpoint, the shape in current model is torch.Size([1280, 1280, 1, 1]).
        size mismatch for model.diffusion_model.output_blocks.4.1.proj_in.weight: copying a param with shape torch.Size([1280, 1280]) from checkpoint, the shape in current model is torch.Size([1280, 1280, 1, 1]).
        size mismatch for model.diffusion_model.output_blocks.4.1.transformer_blocks.0.attn2.to_k.weight: copying a param with shape torch.Size([1280, 1024]) from checkpoint, the shape in current model is torch.Size([1280, 768]).
        size mismatch for model.diffusion_model.output_blocks.4.1.transformer_blocks.0.attn2.to_v.weight: copying a param with shape torch.Size([1280, 1024]) from checkpoint, the shape in current model is torch.Size([1280, 768]).
        size mismatch for model.diffusion_model.output_blocks.4.1.proj_out.weight: copying a param with shape torch.Size([1280, 1280]) from checkpoint, the shape in current model is torch.Size([1280, 1280, 1, 1]).
        size mismatch for model.diffusion_model.output_blocks.5.1.proj_in.weight: copying a param with shape torch.Size([1280, 1280]) from checkpoint, the shape in current model is torch.Size([1280, 1280, 1, 1]).
        size mismatch for model.diffusion_model.output_blocks.5.1.transformer_blocks.0.attn2.to_k.weight: copying a param with shape torch.Size([1280, 1024]) from checkpoint, the shape in current model is torch.Size([1280, 768]).
        size mismatch for model.diffusion_model.output_blocks.5.1.transformer_blocks.0.attn2.to_v.weight: copying a param with shape torch.Size([1280, 1024]) from checkpoint, the shape in current model is torch.Size([1280, 768]).
        size mismatch for model.diffusion_model.output_blocks.5.1.proj_out.weight: copying a param with shape torch.Size([1280, 1280]) from checkpoint, the shape in current model is torch.Size([1280, 1280, 1, 1]).
        size mismatch for model.diffusion_model.output_blocks.6.1.proj_in.weight: copying a param with shape torch.Size([640, 640]) from checkpoint, the shape in current model is torch.Size([640, 640, 1, 1]).
        size mismatch for model.diffusion_model.output_blocks.6.1.transformer_blocks.0.attn2.to_k.weight: copying a param with shape torch.Size([640, 1024]) from checkpoint, the shape in current model is torch.Size([640, 768]).
        size mismatch for model.diffusion_model.output_blocks.6.1.transformer_blocks.0.attn2.to_v.weight: copying a param with shape torch.Size([640, 1024]) from checkpoint, the shape in current model is torch.Size([640, 768]).
        size mismatch for model.diffusion_model.output_blocks.6.1.proj_out.weight: copying a param with shape torch.Size([640, 640]) from checkpoint, the shape in current model is torch.Size([640, 640, 1, 1]).
        size mismatch for model.diffusion_model.output_blocks.7.1.proj_in.weight: copying a param with shape torch.Size([640, 640]) from checkpoint, the shape in current model is torch.Size([640, 640, 1, 1]).
        size mismatch for model.diffusion_model.output_blocks.7.1.transformer_blocks.0.attn2.to_k.weight: copying a param with shape torch.Size([640, 1024]) from checkpoint, the shape in current model is torch.Size([640, 768]).
        size mismatch for model.diffusion_model.output_blocks.7.1.transformer_blocks.0.attn2.to_v.weight: copying a param with shape torch.Size([640, 1024]) from checkpoint, the shape in current model is torch.Size([640, 768]).
        size mismatch for model.diffusion_model.output_blocks.7.1.proj_out.weight: copying a param with shape torch.Size([640, 640]) from checkpoint, the shape in current model is torch.Size([640, 640, 1, 1]).
        size mismatch for model.diffusion_model.output_blocks.8.1.proj_in.weight: copying a param with shape torch.Size([640, 640]) from checkpoint, the shape in current model is torch.Size([640, 640, 1, 1]).
        size mismatch for model.diffusion_model.output_blocks.8.1.transformer_blocks.0.attn2.to_k.weight: copying a param with shape torch.Size([640, 1024]) from checkpoint, the shape in current model is torch.Size([640, 768]).
        size mismatch for model.diffusion_model.output_blocks.8.1.transformer_blocks.0.attn2.to_v.weight: copying a param with shape torch.Size([640, 1024]) from checkpoint, the shape in current model is torch.Size([640, 768]).
        size mismatch for model.diffusion_model.output_blocks.8.1.proj_out.weight: copying a param with shape torch.Size([640, 640]) from checkpoint, the shape in current model is torch.Size([640, 640, 1, 1]).
        size mismatch for model.diffusion_model.output_blocks.9.1.proj_in.weight: copying a param with shape torch.Size([320, 320]) from checkpoint, the shape in current model is torch.Size([320, 320, 1, 1]).
        size mismatch for model.diffusion_model.output_blocks.9.1.transformer_blocks.0.attn2.to_k.weight: copying a param with shape torch.Size([320, 1024]) from checkpoint, the shape in current model is torch.Size([320, 768]).
        size mismatch for model.diffusion_model.output_blocks.9.1.transformer_blocks.0.attn2.to_v.weight: copying a param with shape torch.Size([320, 1024]) from checkpoint, the shape in current model is torch.Size([320, 768]).
        size mismatch for model.diffusion_model.output_blocks.9.1.proj_out.weight: copying a param with shape torch.Size([320, 320]) from checkpoint, the shape in current model is torch.Size([320, 320, 1, 1]).
        size mismatch for model.diffusion_model.output_blocks.10.1.proj_in.weight: copying a param with shape torch.Size([320, 320]) from checkpoint, the shape in current model is torch.Size([320, 320, 1, 1]).
        size mismatch for model.diffusion_model.output_blocks.10.1.transformer_blocks.0.attn2.to_k.weight: copying a param with shape torch.Size([320, 1024]) from checkpoint, the shape in current model is torch.Size([320, 768]).
        size mismatch for model.diffusion_model.output_blocks.10.1.transformer_blocks.0.attn2.to_v.weight: copying a param with shape torch.Size([320, 1024]) from checkpoint, the shape in current model is torch.Size([320, 768]).
        size mismatch for model.diffusion_model.output_blocks.10.1.proj_out.weight: copying a param with shape torch.Size([320, 320]) from checkpoint, the shape in current model is torch.Size([320, 320, 1, 1]).
        size mismatch for model.diffusion_model.output_blocks.11.1.proj_in.weight: copying a param with shape torch.Size([320, 320]) from checkpoint, the shape in current model is torch.Size([320, 320, 1, 1]).
        size mismatch for model.diffusion_model.output_blocks.11.1.transformer_blocks.0.attn2.to_k.weight: copying a param with shape torch.Size([320, 1024]) from checkpoint, the shape in current model is torch.Size([320, 768]).
        size mismatch for model.diffusion_model.output_blocks.11.1.transformer_blocks.0.attn2.to_v.weight: copying a param with shape torch.Size([320, 1024]) from checkpoint, the shape in current model is torch.Size([320, 768]).
        size mismatch for model.diffusion_model.output_blocks.11.1.proj_out.weight: copying a param with shape torch.Size([320, 320]) from checkpoint, the shape in current model is torch.Size([320, 320, 1, 1]).