[Feature Request]: Support for new 2.0 models | 768x768 resolution + new 512x512 + depth + inpainting

AugmentedRealityCat commented 1 year ago

Is there an existing issue for this?

[X] I have searched the existing issues and checked the recent builds/commits

What would your feature do ?

Support the new 768x768 model 2.0 from Stability-AI and all the other new models that just got released.

Proposed workflow

Go to Stable Diffusion Checkpoint selector
Select 768-v-ema.ckpt from the list
Create images just like with any other model
Extra ¨nice to have¨ option: set the resolution to 768x768 automatically when loading this model
Add support for the new 512x512 models: base + inpainting + depth
Add support for the new x4 upscaler model

Links

https://huggingface.co/stabilityai/stable-diffusion-2 https://huggingface.co/stabilityai/stable-diffusion-2-base https://huggingface.co/stabilityai/stable-diffusion-2-depth https://huggingface.co/stabilityai/stable-diffusion-2-inpainting https://huggingface.co/stabilityai/stable-diffusion-x4-upscaler/tree/main

768 model download link on HuggingFace: https://huggingface.co/stabilityai/stable-diffusion-2/blob/main/768-v-ema.ckpt 512 base model download link: https://huggingface.co/stabilityai/stable-diffusion-2-base/blob/main/512-base-ema.ckpt 512 depth model download link: https://huggingface.co/stabilityai/stable-diffusion-2-depth/blob/main/512-depth-ema.ckpt 512 inpainting model download link: https://huggingface.co/stabilityai/stable-diffusion-2-inpainting/blob/main/512-inpainting-ema.ckpt new x4 upscaler download link: https://huggingface.co/stabilityai/stable-diffusion-x4-upscaler/blob/main/x4-upscaler-ema.ckpt

Additional information

Here is the error message you get when trying to load the 768x768 2.0 model with the current release:

Traceback (most recent call last):
  File "C:\stable-diffusion-webui-master\venv\lib\site-packages\gradio\routes.py", line 284, in run_predict
    output = await app.blocks.process_api(
  File "C:\stable-diffusion-webui-master\venv\lib\site-packages\gradio\blocks.py", line 982, in process_api
    result = await self.call_function(fn_index, inputs, iterator)
  File "C:\stable-diffusion-webui-master\venv\lib\site-packages\gradio\blocks.py", line 824, in call_function
    prediction = await anyio.to_thread.run_sync(
  File "C:\stable-diffusion-webui-master\venv\lib\site-packages\anyio\to_thread.py", line 31, in run_sync
    return await get_asynclib().run_sync_in_worker_thread(
  File "C:\stable-diffusion-webui-master\venv\lib\site-packages\anyio\_backends\_asyncio.py", line 937, in run_sync_in_worker_thread
    return await future
  File "C:\stable-diffusion-webui-master\venv\lib\site-packages\anyio\_backends\_asyncio.py", line 867, in run
    result = context.run(func, *args)
  File "C:\stable-diffusion-webui-master\modules\ui.py", line 1662, in <lambda>
    fn=lambda value, k=k: run_settings_single(value, key=k),
  File "C:\stable-diffusion-webui-master\modules\ui.py", line 1504, in run_settings_single
    opts.data_labels[key].onchange()
  File "C:\stable-diffusion-webui-master\webui.py", line 41, in f
    res = func(*args, **kwargs)
  File "C:\stable-diffusion-webui-master\webui.py", line 83, in <lambda>
    shared.opts.onchange("sd_model_checkpoint", wrap_queued_call(lambda: modules.sd_models.reload_model_weights()))
  File "C:\stable-diffusion-webui-master\modules\sd_models.py", line 291, in reload_model_weights
    load_model_weights(sd_model, checkpoint_info)
  File "C:\stable-diffusion-webui-master\modules\sd_models.py", line 182, in load_model_weights
    model.load_state_dict(sd, strict=False)
  File "C:\stable-diffusion-webui-master\venv\lib\site-packages\torch\nn\modules\module.py", line 1604, in load_state_dict
    raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for LatentDiffusion:
        size mismatch for model.diffusion_model.input_blocks.1.1.proj_in.weight: copying a param with shape torch.Size([320, 320]) from checkpoint, the shape in current model is torch.Size([320, 320, 1, 1]).
        size mismatch for model.diffusion_model.input_blocks.1.1.transformer_blocks.0.attn2.to_k.weight: copying a param with shape torch.Size([320, 1024]) from checkpoint, the shape in current model is torch.Size([320, 768]).
        size mismatch for model.diffusion_model.input_blocks.1.1.transformer_blocks.0.attn2.to_v.weight: copying a param with shape torch.Size([320, 1024]) from checkpoint, the shape in current model is torch.Size([320, 768]).
        size mismatch for model.diffusion_model.input_blocks.1.1.proj_out.weight: copying a param with shape torch.Size([320, 320]) from checkpoint, the shape in current model is torch.Size([320, 320, 1, 1]).
        size mismatch for model.diffusion_model.input_blocks.2.1.proj_in.weight: copying a param with shape torch.Size([320, 320]) from checkpoint, the shape in current model is torch.Size([320, 320, 1, 1]).
        size mismatch for model.diffusion_model.input_blocks.2.1.transformer_blocks.0.attn2.to_k.weight: copying a param with shape torch.Size([320, 1024]) from checkpoint, the shape in current model is torch.Size([320, 768]).
        size mismatch for model.diffusion_model.input_blocks.2.1.transformer_blocks.0.attn2.to_v.weight: copying a param with shape torch.Size([320, 1024]) from checkpoint, the shape in current model is torch.Size([320, 768]).
        size mismatch for model.diffusion_model.input_blocks.2.1.proj_out.weight: copying a param with shape torch.Size([320, 320]) from checkpoint, the shape in current model is torch.Size([320, 320, 1, 1]).
        size mismatch for model.diffusion_model.input_blocks.4.1.proj_in.weight: copying a param with shape torch.Size([640, 640]) from checkpoint, the shape in current model is torch.Size([640, 640, 1, 1]).
        size mismatch for model.diffusion_model.input_blocks.4.1.transformer_blocks.0.attn2.to_k.weight: copying a param with shape torch.Size([640, 1024]) from checkpoint, the shape in current model is torch.Size([640, 768]).
        size mismatch for model.diffusion_model.input_blocks.4.1.transformer_blocks.0.attn2.to_v.weight: copying a param with shape torch.Size([640, 1024]) from checkpoint, the shape in current model is torch.Size([640, 768]).
        size mismatch for model.diffusion_model.input_blocks.4.1.proj_out.weight: copying a param with shape torch.Size([640, 640]) from checkpoint, the shape in current model is torch.Size([640, 640, 1, 1]).
        size mismatch for model.diffusion_model.input_blocks.5.1.proj_in.weight: copying a param with shape torch.Size([640, 640]) from checkpoint, the shape in current model is torch.Size([640, 640, 1, 1]).
        size mismatch for model.diffusion_model.input_blocks.5.1.transformer_blocks.0.attn2.to_k.weight: copying a param with shape torch.Size([640, 1024]) from checkpoint, the shape in current model is torch.Size([640, 768]).
        size mismatch for model.diffusion_model.input_blocks.5.1.transformer_blocks.0.attn2.to_v.weight: copying a param with shape torch.Size([640, 1024]) from checkpoint, the shape in current model is torch.Size([640, 768]).
        size mismatch for model.diffusion_model.input_blocks.5.1.proj_out.weight: copying a param with shape torch.Size([640, 640]) from checkpoint, the shape in current model is torch.Size([640, 640, 1, 1]).
        size mismatch for model.diffusion_model.input_blocks.7.1.proj_in.weight: copying a param with shape torch.Size([1280, 1280]) from checkpoint, the shape in current model is torch.Size([1280, 1280, 1, 1]).
        size mismatch for model.diffusion_model.input_blocks.7.1.transformer_blocks.0.attn2.to_k.weight: copying a param with shape torch.Size([1280, 1024]) from checkpoint, the shape in current model is torch.Size([1280, 768]).
        size mismatch for model.diffusion_model.input_blocks.7.1.transformer_blocks.0.attn2.to_v.weight: copying a param with shape torch.Size([1280, 1024]) from checkpoint, the shape in current model is torch.Size([1280, 768]).
        size mismatch for model.diffusion_model.input_blocks.7.1.proj_out.weight: copying a param with shape torch.Size([1280, 1280]) from checkpoint, the shape in current model is torch.Size([1280, 1280, 1, 1]).
        size mismatch for model.diffusion_model.input_blocks.8.1.proj_in.weight: copying a param with shape torch.Size([1280, 1280]) from checkpoint, the shape in current model is torch.Size([1280, 1280, 1, 1]).
        size mismatch for model.diffusion_model.input_blocks.8.1.transformer_blocks.0.attn2.to_k.weight: copying a param with shape torch.Size([1280, 1024]) from checkpoint, the shape in current model is torch.Size([1280, 768]).
        size mismatch for model.diffusion_model.input_blocks.8.1.transformer_blocks.0.attn2.to_v.weight: copying a param with shape torch.Size([1280, 1024]) from checkpoint, the shape in current model is torch.Size([1280, 768]).
        size mismatch for model.diffusion_model.input_blocks.8.1.proj_out.weight: copying a param with shape torch.Size([1280, 1280]) from checkpoint, the shape in current model is torch.Size([1280, 1280, 1, 1]).
        size mismatch for model.diffusion_model.middle_block.1.proj_in.weight: copying a param with shape torch.Size([1280, 1280]) from checkpoint, the shape in current model is torch.Size([1280, 1280, 1, 1]).
        size mismatch for model.diffusion_model.middle_block.1.transformer_blocks.0.attn2.to_k.weight: copying a param with shape torch.Size([1280, 1024]) from checkpoint, the shape in current model is torch.Size([1280, 768]).
        size mismatch for model.diffusion_model.middle_block.1.transformer_blocks.0.attn2.to_v.weight: copying a param with shape torch.Size([1280, 1024]) from checkpoint, the shape in current model is torch.Size([1280, 768]).
        size mismatch for model.diffusion_model.middle_block.1.proj_out.weight: copying a param with shape torch.Size([1280, 1280]) from checkpoint, the shape in current model is torch.Size([1280, 1280, 1, 1]).
        size mismatch for model.diffusion_model.output_blocks.3.1.proj_in.weight: copying a param with shape torch.Size([1280, 1280]) from checkpoint, the shape in current model is torch.Size([1280, 1280, 1, 1]).
        size mismatch for model.diffusion_model.output_blocks.3.1.transformer_blocks.0.attn2.to_k.weight: copying a param with shape torch.Size([1280, 1024]) from checkpoint, the shape in current model is torch.Size([1280, 768]).
        size mismatch for model.diffusion_model.output_blocks.3.1.transformer_blocks.0.attn2.to_v.weight: copying a param with shape torch.Size([1280, 1024]) from checkpoint, the shape in current model is torch.Size([1280, 768]).
        size mismatch for model.diffusion_model.output_blocks.3.1.proj_out.weight: copying a param with shape torch.Size([1280, 1280]) from checkpoint, the shape in current model is torch.Size([1280, 1280, 1, 1]).
        size mismatch for model.diffusion_model.output_blocks.4.1.proj_in.weight: copying a param with shape torch.Size([1280, 1280]) from checkpoint, the shape in current model is torch.Size([1280, 1280, 1, 1]).
        size mismatch for model.diffusion_model.output_blocks.4.1.transformer_blocks.0.attn2.to_k.weight: copying a param with shape torch.Size([1280, 1024]) from checkpoint, the shape in current model is torch.Size([1280, 768]).
        size mismatch for model.diffusion_model.output_blocks.4.1.transformer_blocks.0.attn2.to_v.weight: copying a param with shape torch.Size([1280, 1024]) from checkpoint, the shape in current model is torch.Size([1280, 768]).
        size mismatch for model.diffusion_model.output_blocks.4.1.proj_out.weight: copying a param with shape torch.Size([1280, 1280]) from checkpoint, the shape in current model is torch.Size([1280, 1280, 1, 1]).
        size mismatch for model.diffusion_model.output_blocks.5.1.proj_in.weight: copying a param with shape torch.Size([1280, 1280]) from checkpoint, the shape in current model is torch.Size([1280, 1280, 1, 1]).
        size mismatch for model.diffusion_model.output_blocks.5.1.transformer_blocks.0.attn2.to_k.weight: copying a param with shape torch.Size([1280, 1024]) from checkpoint, the shape in current model is torch.Size([1280, 768]).
        size mismatch for model.diffusion_model.output_blocks.5.1.transformer_blocks.0.attn2.to_v.weight: copying a param with shape torch.Size([1280, 1024]) from checkpoint, the shape in current model is torch.Size([1280, 768]).
        size mismatch for model.diffusion_model.output_blocks.5.1.proj_out.weight: copying a param with shape torch.Size([1280, 1280]) from checkpoint, the shape in current model is torch.Size([1280, 1280, 1, 1]).
        size mismatch for model.diffusion_model.output_blocks.6.1.proj_in.weight: copying a param with shape torch.Size([640, 640]) from checkpoint, the shape in current model is torch.Size([640, 640, 1, 1]).
        size mismatch for model.diffusion_model.output_blocks.6.1.transformer_blocks.0.attn2.to_k.weight: copying a param with shape torch.Size([640, 1024]) from checkpoint, the shape in current model is torch.Size([640, 768]).
        size mismatch for model.diffusion_model.output_blocks.6.1.transformer_blocks.0.attn2.to_v.weight: copying a param with shape torch.Size([640, 1024]) from checkpoint, the shape in current model is torch.Size([640, 768]).
        size mismatch for model.diffusion_model.output_blocks.6.1.proj_out.weight: copying a param with shape torch.Size([640, 640]) from checkpoint, the shape in current model is torch.Size([640, 640, 1, 1]).
        size mismatch for model.diffusion_model.output_blocks.7.1.proj_in.weight: copying a param with shape torch.Size([640, 640]) from checkpoint, the shape in current model is torch.Size([640, 640, 1, 1]).
        size mismatch for model.diffusion_model.output_blocks.7.1.transformer_blocks.0.attn2.to_k.weight: copying a param with shape torch.Size([640, 1024]) from checkpoint, the shape in current model is torch.Size([640, 768]).
        size mismatch for model.diffusion_model.output_blocks.7.1.transformer_blocks.0.attn2.to_v.weight: copying a param with shape torch.Size([640, 1024]) from checkpoint, the shape in current model is torch.Size([640, 768]).
        size mismatch for model.diffusion_model.output_blocks.7.1.proj_out.weight: copying a param with shape torch.Size([640, 640]) from checkpoint, the shape in current model is torch.Size([640, 640, 1, 1]).
        size mismatch for model.diffusion_model.output_blocks.8.1.proj_in.weight: copying a param with shape torch.Size([640, 640]) from checkpoint, the shape in current model is torch.Size([640, 640, 1, 1]).
        size mismatch for model.diffusion_model.output_blocks.8.1.transformer_blocks.0.attn2.to_k.weight: copying a param with shape torch.Size([640, 1024]) from checkpoint, the shape in current model is torch.Size([640, 768]).
        size mismatch for model.diffusion_model.output_blocks.8.1.transformer_blocks.0.attn2.to_v.weight: copying a param with shape torch.Size([640, 1024]) from checkpoint, the shape in current model is torch.Size([640, 768]).
        size mismatch for model.diffusion_model.output_blocks.8.1.proj_out.weight: copying a param with shape torch.Size([640, 640]) from checkpoint, the shape in current model is torch.Size([640, 640, 1, 1]).
        size mismatch for model.diffusion_model.output_blocks.9.1.proj_in.weight: copying a param with shape torch.Size([320, 320]) from checkpoint, the shape in current model is torch.Size([320, 320, 1, 1]).
        size mismatch for model.diffusion_model.output_blocks.9.1.transformer_blocks.0.attn2.to_k.weight: copying a param with shape torch.Size([320, 1024]) from checkpoint, the shape in current model is torch.Size([320, 768]).
        size mismatch for model.diffusion_model.output_blocks.9.1.transformer_blocks.0.attn2.to_v.weight: copying a param with shape torch.Size([320, 1024]) from checkpoint, the shape in current model is torch.Size([320, 768]).
        size mismatch for model.diffusion_model.output_blocks.9.1.proj_out.weight: copying a param with shape torch.Size([320, 320]) from checkpoint, the shape in current model is torch.Size([320, 320, 1, 1]).
        size mismatch for model.diffusion_model.output_blocks.10.1.proj_in.weight: copying a param with shape torch.Size([320, 320]) from checkpoint, the shape in current model is torch.Size([320, 320, 1, 1]).
        size mismatch for model.diffusion_model.output_blocks.10.1.transformer_blocks.0.attn2.to_k.weight: copying a param with shape torch.Size([320, 1024]) from checkpoint, the shape in current model is torch.Size([320, 768]).
        size mismatch for model.diffusion_model.output_blocks.10.1.transformer_blocks.0.attn2.to_v.weight: copying a param with shape torch.Size([320, 1024]) from checkpoint, the shape in current model is torch.Size([320, 768]).
        size mismatch for model.diffusion_model.output_blocks.10.1.proj_out.weight: copying a param with shape torch.Size([320, 320]) from checkpoint, the shape in current model is torch.Size([320, 320, 1, 1]).
        size mismatch for model.diffusion_model.output_blocks.11.1.proj_in.weight: copying a param with shape torch.Size([320, 320]) from checkpoint, the shape in current model is torch.Size([320, 320, 1, 1]).
        size mismatch for model.diffusion_model.output_blocks.11.1.transformer_blocks.0.attn2.to_k.weight: copying a param with shape torch.Size([320, 1024]) from checkpoint, the shape in current model is torch.Size([320, 768]).
        size mismatch for model.diffusion_model.output_blocks.11.1.transformer_blocks.0.attn2.to_v.weight: copying a param with shape torch.Size([320, 1024]) from checkpoint, the shape in current model is torch.Size([320, 768]).
        size mismatch for model.diffusion_model.output_blocks.11.1.proj_out.weight: copying a param with shape torch.Size([320, 320]) from checkpoint, the shape in current model is torch.Size([320, 320, 1, 1]).

acheong08 commented 1 year ago

@mrgreaper Using a fork of this repo: https://github.com/acheong08/stable-diffusion-webui/tree/SDV2.0 (This is a branch. Do git checkout SDV2.0 after cloning)

To launch: COMMANDLINE_ARGS="--share --gradio-auth {gradio_username}:{gradio_password}" REQS_FILE="requirements.txt" python launch.py --config repositories/stable-diffusion/configs/stable-diffusion/v2-inference-v.yaml

I am using a Colab version https://colab.research.google.com/drive/1PvNyEWIhDU_D-i15DzpPjqDQkbYv_6Hu with SD2_0 checked.

ghost commented 1 year ago

@mrgreaper Using a fork of this repo: https://github.com/acheong08/stable-diffusion-webui/tree/SDV2.0 (This is a branch. Do git checkout SDV2.0 after cloning)

To launch: COMMANDLINE_ARGS="--share --gradio-auth {gradio_username}:{gradio_password}" REQS_FILE="requirements.txt" python launch.py --config repositories/stable-diffusion/configs/stable-diffusion/v2-inference-v.yaml

I am using a Colab version https://colab.research.google.com/drive/1PvNyEWIhDU_D-i15DzpPjqDQkbYv_6Hu with SD2_0 checked.

ah! I thought the main program had been updated... will make another repo on my hard drive.... (these model folders are starting to add up lol... could not get the model folder command to work for a central model storages)

--share would be a bad idea though, that opens up access to anyone

FreeBlues commented 1 year ago

@mrgreaper Using a fork of this repo: https://github.com/acheong08/stable-diffusion-webui/tree/SDV2.0 (This is a branch. Do git checkout SDV2.0 after cloning)

To launch: COMMANDLINE_ARGS="--share --gradio-auth {gradio_username}:{gradio_password}" REQS_FILE="requirements.txt" python launch.py --config repositories/stable-diffusion/configs/stable-diffusion/v2-inference-v.yaml

I am using a Colab version https://colab.research.google.com/drive/1PvNyEWIhDU_D-i15DzpPjqDQkbYv_6Hu with SD2_0 checked.

Can it run on macOS ? thanks.

bbecausereasonss commented 1 year ago

The fact that they even removed nudity is pretty silly :/

MegaScience commented 1 year ago

Looking forward to the minimum specs to run these v2.0 models. Seen 12GB mentioned here, but with suggestion it could run smaller with certain arrangements. Would be good that with the respective update - when ready - the currently determined minimum specs are listed. (I bring up all this as I'm on a measly 1060 6GB.)

bbecausereasonss commented 1 year ago

https://github.com/AUTOMATIC1111/stable-diffusion-webui/issues/5037

arpitest commented 1 year ago

@acheong08 can you fix inpanting too?

Loading weights [a1385830] from /home/arpi/stable/GUI/v2/stable-diffusion-webui/models/Stable-diffusion/512-inpainting-ema.ckpt

    size mismatch for model.diffusion_model.input_blocks.0.0.weight: copying a param with shape torch.Size([320, 9, 3, 3]) from checkpoint, the shape in current model is torch.Size([320, 4, 3, 3]).

ghost commented 1 year ago

The fact that they even removed nudity is pretty silly :/

nudity aint the issue its the styles removed thats bad: 1.5 00180-3422248285-Alyson Hannigan, playing snooker , fantasy, in a tavern, dungeons and dragons, art by Peter Mohrbacher, 16k, digital art-before-face-restoration

vs 2.0

00035-3422248285-Alyson Hannigan, playing snooker , fantasy, in a tavern, dungeons and dragons, art by Peter Mohrbacher, 16k, digital art

Different sampler as ddim doesnt work with 2.0 in the fork i am using but that would not make this drastic change

bbecausereasonss commented 1 year ago

Yikes. You mean to say they removed all the artist/styles?

eadnams22 commented 1 year ago

The fact that they even removed nudity is pretty silly :/

nudity aint the issue its the styles removed thats bad: 1.5

vs 2.0

Different sampler as ddim doesnt work with 2.0 in the fork i am using but that would not make this drastic change

Prompting has also changed, so same prompt is gonna work fairly differently too.

Nsanity03 commented 1 year ago

for now I'm very disappointed, I absolutely don't like the removal of styles and nsfw contents, the styles are really useful and I don't understand why they removed them, while for the nsfw contents I understand they want to remove them to avoid any problems but not necessarily yes they can use such functions for malicious purposes.

ghost commented 1 year ago

Prompting has also changed, so same prompt is gonna work fairly differently too.

true... but i doubt thats the issue.

at 768x 768 on the 768 model, the prompting is crazy accurate

photo of a cat in a space suit, a realistic moon in the background, 8k

00023-3109430796-photo of a cat in a space suit, a realistic moon in the background, 8k

but at what cost?

AugementedOwl commented 1 year ago

That is why we keep old models- and get new models from others.

If this model is better at say- crazy stock art- that's good. We still have Anything or the previous versions for celebrities and styles.

Someone could likely re-train 2.0 using artist styles anyway if they wanted.

eadnams22 commented 1 year ago

Prompting has also changed, so same prompt is gonna work fairly differently too.

true... but i doubt thats the issue.

at 768x 768 on the 768 model, the prompting is crazy accurate

photo of a cat in a space suit, a realistic moon in the background, 8k

but at what cost?

From what I’ve read from emad, this is a general use base, meant for others to more easily build upon.

I think of it as the pizza crust, the intent is that we can finetune with our own toppings more easily than before.

Excited to see what we can make for models in the coming days, weeks, and months!

For styles, I suggest citing the actual art movement/era instead of an artist specifically. :)

uservar commented 1 year ago

@toriato

fixed attention and emphasis part not sure how to implement CLIP stop layers feature...

Seems like you just need to add self.wrapped.layer_idx = opts.CLIP_stop_at_last_layers before z = self.wrapped.encode_with_transformer(tokens)

Also it seems your edits lead to the generated images changing compared to not using FrozenCLIPEmbedderWithCustomWords at all, at least that's what happened when I re-implemented while keeping backwards compatibility at https://github.com/uservar/stable-diffusion-webui/blob/dev2/modules/sd_hijack.py

eadnams22 commented 1 year ago

That is why we keep old models- and get new models from others.

If this model is better at say- crazy stock art- that's good. We still have Anything or the previous versions for celebrities and styles.

Someone could likely re-train 2.0 using artist styles anyway if they wanted.

Yup, 1.5 and the models we’ve made haven’t gone anywhere.

ghost commented 1 year ago

That is why we keep old models- and get new models from others. If this model is better at say- crazy stock art- that's good. We still have Anything or the previous versions for celebrities and styles. Someone could likely re-train 2.0 using artist styles anyway if they wanted.

Yup, 1.5 and the models we’ve made haven’t gone anywhere.

just hope when automatic1111 is updated it will work well with 2.0 and 1.5 and that dreambooth extenstion will work well with both.

clockworkwhale commented 1 year ago

Reminder that because these models are open source, any kind of content that's missing from the base model can (and almost certainly will) simply be trained back in. The AI enthusiast community will train 2.0 on nsfw, they'll train it on Greg Rutkowski and other missing artists, etc. etc. And if you're sceptical that training can alter the model drastically enough to make it great at things that the base model sucks at, look at what NAI did with anime.

Take some more white pills, frens ❤️

ghost commented 1 year ago

Reminder that because these models are open source, any kind of content that's missing from the base model can (and almost certainly will) simply be trained back in. The AI enthusiast community will train 2.0 on nsfw, they'll train it on Greg Rutkowski and other missing artists, etc. etc. And if you're sceptical that training can alter the model drastically enough to make it great at things that the base model sucks at, look at what NAI did with anime.

Take some more white pills, frens ❤️

There is a major amount of missing content, it would take a long long time to train that all in

Nsanity03 commented 1 year ago

Reminder that because these models are open source, any kind of content that's missing from the base model can (and almost certainly will) simply be trained back in. The AI enthusiast community will train 2.0 on nsfw, they'll train it on Greg Rutkowski and other missing artists, etc. etc. And if you're sceptical that training can alter the model drastically enough to make it great at things that the base model sucks at, look at what NAI did with anime. Take some more white pills, frens ❤️

There is a major amount of missing content, it would take a long long time to train that all in

I fully agree, this will be a problem

clockworkwhale commented 1 year ago

There is a major amount of missing content, it would take a long long time to train that all in

"A long time" with the development pace that SD has had so far is like two months.

ghost commented 1 year ago

There is a major amount of missing content, it would take a long long time to train that all in

"A long time" with the development pace that SD has had so far is like two months.

yes and no, i mean a long time as in 1000's of hours training by someone very very competent. Its a lot missing. the nsfw stuff will no doubt get added fast. art styles will have less demand (sadly) and will take longer and is harder to train. (in my experience) Celebrity stuff is also going to be tricky, the absence of celebs will effect how it generates non celebs too... its like teaching someone to talk but not teaching them certain syllables

AugmentedRealityCat commented 1 year ago

look at what NAI did with anime.

NovelAI probably have exclusive access to the NSFW version of the 2.0 model by now.

Removing NSFW from the public model is just Stability's strategy to create artificial rarity and pump up the value of their investment in NovelAI by making their hentai-on-demand service "exclusive".

AugmentedRealityCat commented 1 year ago

Try 768x768, since that's what the model's trained for. Doing what you're doing is like telling 1.5 to work at 256x256, it ain't good at resolutions lower than what it was meant for.

Absolutely ! I confirm the 768 model is really really bad at producing 512x512 images. I guess that's why they also released a new 512x512 model.

MrCheeze commented 1 year ago

@uservar - looks like the reason DDIM and PLMS are broken on your branch (and mine) right now is the 1.5-inpainting model hijack:

https://github.com/AUTOMATIC1111/stable-diffusion-webui/blob/828438b4a190759807f9054932cae3a8b880ddf1/modules/sd_models.py#L245 https://github.com/AUTOMATIC1111/stable-diffusion-webui/blob/828438b4a190759807f9054932cae3a8b880ddf1/modules/sd_hijack_inpainting.py#L327

This hijack happens unconditionally (!) and forks off an old version of those samplers.

SentinelAV commented 1 year ago

Styles and NSFW content aside, I'm finding that SD 2.0 has some massive improvements right out of the gate.

Using the following prompt with DPM++ 2M Karras at 1024 x 512 @ 20 steps: an (empire of the sun:1) with (colossal vertical floating golden disc carved into distant mountainside:3) and (massive worshipping crowd overlooking:1.5), (stunning concept art with immaculate sunset lighting:2), (beautiful, pathtraced photorealistic 3d render, octane, cycles, vray:2), (intricate ornate patterns:1), with (detailed grain textured shiny reflective geometry:2), (cinematic composition, colours and contrast:3), (reflections, shadows, depth:2), (caustics:2) and (filmic vignette:1)

Negative: (simple:2) (cartoon, flat:3) (low contrast lighting:2) (unrealistic:2) (ugly design:3) (outdated:2) (old:2) (boring:2) (untextured, unlit:3) (2D, 2-dimensional:5) (text:2) (split image:2) (overcast:2)

I was able to produce the following results using MrCheeze's 2.0 branch:

00001-4216307720-an (empire of

00002-4216307721-an (empire of

00065-2263928365-an (empire of the sun_1) with (colossal vertical floating golden disc carved into distant mountainside_3) and (massive worshippi

00067-2263928367-an (empire of the sun_1) with (colossal vertical floating golden disc carved into distant mountainside_3) and (massive worshippi

Compared to 1.5 with identical settings (though with word-specific weights removed as the results often resembled incoherent latent noise):

00050-916714868-an empire of t

00047-3927031157-an empire of

00048-820820374-an empire of t

00049-3875556301-an empire of

While 1.5 produces some pretty images, 2.0 has far higher detail and coherency, and imo, more interesting, varied and artistic compositions.

AugmentedRealityCat commented 1 year ago

While 1.5 produces some pretty images, 2.0 has far higher detail and coherency, and imo, more interesting, varied and artistic compositions.

It only makes it more frustrating to know that Stability AI voluntarily crippled it. It could have been so much more.

CarlKenner commented 1 year ago

Have a look at this implementation by @uservar https://github.com/uservar/stable-diffusion-webui/tree/dev2

Reminder that because these models are open source, any kind of content that's missing from the base model can (and almost certainly will) simply be trained back in.

No, it can't. Please stop repeating this lie. Firstly, part of the problem is the new CLIP text encoder, not just the training images. Secondly, training a neural network on new images causes it to forget how to generate the old images. You need to train all the (5 Billion) images at once, and only they have the hardware and time to do that. Thirdly, the amount that is missing is too huge. It's missing all training images that weren't aesthetically pleasing, celebrities, styles, and NSFW. Fourthly, all the previous new models we have seen were trained from a starting point with a much broader knowledge base, I don't think you can get the same results with a much narrower starting point. Fifthly, nobody has added support for training inpainting models yet, let alone depth models or upscaling models. Anyone wanting to train their own dreambooth model now has to deal with 6 or 7 separate kinds of models, none of which have all the needed features.

Stability AI really screwed up.

eadnams22 commented 1 year ago

Yup, it’ll take time to get those tools going, just like it did last time, though from a technically better starting point, it seems.

Excited to see where the community, and other devs take it now!

JackCloudman commented 1 year ago

This isnt SD 2.0 is SD 0.2

uservar commented 1 year ago

@uservar - looks like the reason DDIM and PLMS are broken on your branch (and mine) right now is the 1.5-inpainting model hijack:

https://github.com/AUTOMATIC1111/stable-diffusion-webui/blob/828438b4a190759807f9054932cae3a8b880ddf1/modules/sd_models.py#L245

https://github.com/AUTOMATIC1111/stable-diffusion-webui/blob/828438b4a190759807f9054932cae3a8b880ddf1/modules/sd_hijack_inpainting.py#L327

This hijack happens unconditionally (!) and forks off an old version of those samplers.

Thanks for the heads up, seems like it is safe to remove the sampler hijacks in particular with the new Stability-AI repo. I have yet to try the new inpainting model though.

clockworkwhale commented 1 year ago

@uservar - looks like the reason DDIM and PLMS are broken on your branch (and mine) right now is the 1.5-inpainting model hijack: https://github.com/AUTOMATIC1111/stable-diffusion-webui/blob/828438b4a190759807f9054932cae3a8b880ddf1/modules/sd_models.py#L245

https://github.com/AUTOMATIC1111/stable-diffusion-webui/blob/828438b4a190759807f9054932cae3a8b880ddf1/modules/sd_hijack_inpainting.py#L327

This hijack happens unconditionally (!) and forks off an old version of those samplers.

Thanks for the heads up, seems like it is safe to remove the sampler hijacks in particular with the new Stability-AI repo. I have yet to try the new inpainting model though.

Your repo switches into the correct LatentInpainting mode and successfully loads the 2.0 inpainting weights without any errors if I rename the checkpoint to sd-v1-5-inpainting.ckpt. Inference runs and throws no errors either. But the actual output is borked, just a beige blur in the masked area regardless of settings.

Update: I'm dumb, forgot to uncheck Use v-prediction, had it on from when I was testing the 768 model. Works now, getting good coherent inpainted output. Too soon to tell whether it's superior to the 1.5 inpainting model.

uservar commented 1 year ago

The inpainting model should work now without needing to rename the checkpoint. For anyone who wants to try it: https://colab.research.google.com/drive/1ayH6PUri-vvTXhaoL3NEZr_iVvv2qosR

AugmentedRealityCat commented 1 year ago

Can we use your repo locally ? If I understand correctly, it would be this branch over here: https://github.com/uservar/stable-diffusion-webui/tree/dev2

uservar commented 1 year ago

I've only tested it using google colab, but it should:tm: work locally as well for the new models as long as you specify the config with something like --config repositories/stable-diffusion/configs/stable-diffusion/v2-inference.yaml

The older models work as well without that flag, but I'm not sure how to make them both work automatically

junebug12851 commented 1 year ago

Is there any way 2.0 can be optional in settings, like picking between 1.5 or 2.0 or will 2.0 have to be the new WebUI meaning you have to use an old commit if you want 1.5

MrCheeze commented 1 year ago

I've only tested it using google colab, but it should™️ work locally as well for the new models as long as you specify the config with something like --config repositories/stable-diffusion/configs/stable-diffusion/v2-inference.yaml

The older models work as well without that flag, but I'm not sure how to make them both work automatically

It does work, I've been pointing people towards your branch as the most developed 2.0 support right now.

The ideal behaviour would probably be to autodetect the model architecture (1.X, 1.5-inpainting, 2.0, 2.0-v, 2.0-inpainting) and to choose a default config accordingly. Failing that, for now we can just put the 2.0 yamls in the SD models folder, renamed to 512-base-ema.yaml and 768-v-ema.yaml so that they'll be used only for those two models.

By the way, I've got support for --no-half and partial support for --medvram here: https://github.com/MrCheeze/stable-diffusion-webui/commit/077611a9ecb692f093cbe07a0e84cc6f547adf24

I say partial because I couldn't figure out how to unload and load the openclip model, so I just left it loaded at all times.

AugmentedRealityCat commented 1 year ago

Thanks a lot for your work MrCheeze ! I can't thank you enough for providing us with the first working solution to run the new model on A1111.

MrCheeze commented 1 year ago

(you should definitely be using uservar's much improved version at this point though, my commit above is a tweak of it)

MV799 commented 1 year ago

This thread reads like twitter. So many people have opinions without even trying it. I tried the SD2.0, 768 works fine with the repo clone posted here. Artists are there (classical for sure). Greg Rutkowski is gone. Seems some other bigger current artist may be too, IDK, but not all (artstation does produce results). Thomas kinkade is there. Where it seems to shine is that the results follow much more prompt. I put a woman on a bicycle, got a woman, framed well. I put two women - got two women hassling over a bicycle. (I put three, but still got just two) Most of my random prompts were mostly okay framed with very few cut heads. Where it differs significantly is that prompts will NOT work the same way as in 1.5 as they were trained with different meanings (more specific) In 2.0 "car and old house, pen and ink" would result in BW ink drawing 100% of the time, unless I specify colorful. It understands letter a little bit better. "sign that says hello" did try to spell it and even got it correctly on 3dt. Hands and fingers are still a BIG mess. Holding something, often means it is attached to fingers in some weird way. There are far more woman in color - just woman prompt will randomly choose black woman 50% of time. Prompt "woman holding a phone" would result in the most awkward acrobatics of 2 to 10 fingers wrapped around 1 to 5 phones with a blurred face behind it, 100% of the time. Not a single usable picture. Women can't hold a teacup. They just can't "woman in armor holding a sword" would consistently have a woman with about 3-5 swords sticking from her. ... etc

So, yes you can call it 2.0, but this is a very rough base onto which someone needs to build upon - I don't know who the someone is.

AugmentedRealityCat commented 1 year ago

(I put three, but still got just two)

I suppose that will require model 3.0 !

Hands and fingers are still a BIG mess.

I'm beginning to think this is directly related to model puritanism since the best hand results we've got so far were from "anatomical" models (coupled with the 1.5 VAE). There is no scientific basis to this, but would not be surprised if we were to discover that More nudes = better hands, and better anatomical details overall as well.

0xdevalias commented 1 year ago

Is there any way 2.0 can be optional in settings, like picking between 1.5 or 2.0 or will 2.0 have to be the new WebUI meaning you have to use an old commit if you want 1.5

@junebug12851 I suspect that once the initial gruntwork is done to figure out all the bits required to make it work, that both the 1.x and 2.x models will be usable within the UI at the same time (by switching between them), and I doubt that users will have to stay on older versions of the code.

0xdevalias commented 1 year ago

From @pcuenca on the HF discord:
We are busy preparing a new release of diffusers to fully support Stable Diffusion 2. We are still ironing things out, but the basics already work from the main branch in github. Here's how to do it:

Install diffusers from github alongside its dependencies:
pip install --upgrade git+https://github.com/huggingface/diffusers.git transformers accelerate scipy
Use the code in this script to run your predictions:
from diffusers import DiffusionPipeline, EulerDiscreteScheduler
import torch

repo_id = "stabilityai/stable-diffusion-2"
device = "cuda"

scheduler = EulerDiscreteScheduler.from_pretrained(repo_id, subfolder="scheduler", prediction_type="v_prediction")
pipe = DiffusionPipeline.from_pretrained(repo_id, torch_dtype=torch.float16, revision="fp16", scheduler=scheduler)
pipe = pipe.to(device)

prompt = "High quality photo of an astronaut riding a horse in space"
image = pipe(prompt, width=768, height=768, guidance_scale=9).images[0]
image.save("astronaut.png")
Originally posted by @vvvm23 in https://github.com/huggingface/diffusers/issues/1392#issuecomment-1326747275

0xdevalias commented 1 year ago

how sure are you that your conversion is correct? I'm trying to diagnose a difference I get between your 768 weights and my conversion script. There's a big difference, and in general I much prefer the results from my conversion. It seems specific to the unet - if I replace my unet with yours I get the same results.

Originally posted by @hafriedlander in https://github.com/huggingface/diffusers/issues/1388#issuecomment-1327018829

OK, differential diagnostic done, it's the Tokenizer. How did you create the Tokenizer at https://huggingface.co/stabilityai/stable-diffusion-2/tree/main/tokenizer? I just built a Tokenizer using AutoTokenizer.from_pretrained("laion/CLIP-ViT-H-14-laion2B-s32B-b79K") - it seems to give much better results.

Originally posted by @hafriedlander in https://github.com/huggingface/diffusers/issues/1388#issuecomment-1327031107

I've put "my" version of the Tokenizer at https://huggingface.co/halffried/sd2-laion-clipH14-tokenizer/tree/main. You can just replace the tokenizer in any pipeline to test it if you're interested.

Originally posted by @hafriedlander in https://github.com/huggingface/diffusers/issues/1388#issuecomment-1327077503

CarlKenner commented 1 year ago

I've almost finished a proper implementation of Stable Diffusion 2.0 in Automatic1111, so that it runs locally and automatically updates everything and works on 4GB lowvram. It supports both 1.5 and 2.0 models and you can switch between models from the menu like normal.

So far the 512x512 base model, 512x512 inpainting model, and the 768x768 v-prediction model work properly. The upscaler model and depth models load correctly but don't work to generate images yet. It gives an error trying to load old Textual Inversion embeddings with the new models, but that can't be helped. And the PLMS sampling method isn't working. I'll push it soon.

tuwonga commented 1 year ago

I've almost finished a proper implementation of Stable Diffusion 2.0 in Automatic1111, so that it runs locally and automatically updates everything and works on 4GB lowvram. It supports both 1.5 and 2.0 models and you can switch between models from the menu like normal.

So far the 512x512 base model, 512x512 inpainting model, and the 768x768 v-prediction model work properly. The upscaler model and depth models load correctly but don't work to generate images yet. It gives an error trying to load old Textual Inversion embeddings with the new models, but that can't be helped. And the PLMS sampling method isn't working. I'll push it soon.

Wow, I can't wait. thank you

aniketgore commented 1 year ago

Thanks a lot for your work MrCheeze ! I can't thank you enough for providing us with the first working solution to run the new model on A1111.

I've almost finished a proper implementation of Stable Diffusion 2.0 in Automatic1111, so that it runs locally and automatically updates everything and works on 4GB lowvram. It supports both 1.5 and 2.0 models and you can switch between models from the menu like normal.

So far the 512x512 base model, 512x512 inpainting model, and the 768x768 v-prediction model work properly. The upscaler model and depth models load correctly but don't work to generate images yet. It gives an error trying to load old Textual Inversion embeddings with the new models, but that can't be helped. And the PLMS sampling method isn't working. I'll push it soon.

Great. Can't wait for it.

TheLastBen commented 1 year ago

@CarlKenner any solution to optimize the RAM (under 12.6GB) to use it in Colab ?

CarlKenner commented 1 year ago

https://github.com/CarlKenner/stable-diffusion-webui/tree/dev2-carl

@CarlKenner any solution to optimize the RAM (under 12.6GB) to use it in Colab ?

You probably know more about that than me, lol. I'm just getting it to run image generation on my 4GB Geforce GTX 970 (although it turns out @MrCheeze already did it while I was working on it). I haven't looked at training or anything like that.

aniketgore commented 1 year ago

https://github.com/CarlKenner/stable-diffusion-webui/tree/dev2-carl

@CarlKenner any solution to optimize the RAM (under 12.6GB) to use it in Colab ?

You probably know more about that than me, lol. I'm just getting it to run image generation on my 4GB Geforce GTX 970 (although it turns out @MrCheeze already did it while I was working on it). I haven't looked at training or anything like that.

Thanks for the link. How do we run it? I tried running user2's branch, but I couldn't get it working. Appreciate your help.

AUTOMATIC1111 / stable-diffusion-webui