[Feature Request]: Support for new 2.0 models | 768x768 resolution + new 512x512 + depth + inpainting

AugmentedRealityCat commented 1 year ago

Is there an existing issue for this?

[X] I have searched the existing issues and checked the recent builds/commits

What would your feature do ?

Support the new 768x768 model 2.0 from Stability-AI and all the other new models that just got released.

Proposed workflow

Go to Stable Diffusion Checkpoint selector
Select 768-v-ema.ckpt from the list
Create images just like with any other model
Extra ¨nice to have¨ option: set the resolution to 768x768 automatically when loading this model
Add support for the new 512x512 models: base + inpainting + depth
Add support for the new x4 upscaler model

Links

https://huggingface.co/stabilityai/stable-diffusion-2 https://huggingface.co/stabilityai/stable-diffusion-2-base https://huggingface.co/stabilityai/stable-diffusion-2-depth https://huggingface.co/stabilityai/stable-diffusion-2-inpainting https://huggingface.co/stabilityai/stable-diffusion-x4-upscaler/tree/main

768 model download link on HuggingFace: https://huggingface.co/stabilityai/stable-diffusion-2/blob/main/768-v-ema.ckpt 512 base model download link: https://huggingface.co/stabilityai/stable-diffusion-2-base/blob/main/512-base-ema.ckpt 512 depth model download link: https://huggingface.co/stabilityai/stable-diffusion-2-depth/blob/main/512-depth-ema.ckpt 512 inpainting model download link: https://huggingface.co/stabilityai/stable-diffusion-2-inpainting/blob/main/512-inpainting-ema.ckpt new x4 upscaler download link: https://huggingface.co/stabilityai/stable-diffusion-x4-upscaler/blob/main/x4-upscaler-ema.ckpt

Additional information

Here is the error message you get when trying to load the 768x768 2.0 model with the current release:

Traceback (most recent call last):
  File "C:\stable-diffusion-webui-master\venv\lib\site-packages\gradio\routes.py", line 284, in run_predict
    output = await app.blocks.process_api(
  File "C:\stable-diffusion-webui-master\venv\lib\site-packages\gradio\blocks.py", line 982, in process_api
    result = await self.call_function(fn_index, inputs, iterator)
  File "C:\stable-diffusion-webui-master\venv\lib\site-packages\gradio\blocks.py", line 824, in call_function
    prediction = await anyio.to_thread.run_sync(
  File "C:\stable-diffusion-webui-master\venv\lib\site-packages\anyio\to_thread.py", line 31, in run_sync
    return await get_asynclib().run_sync_in_worker_thread(
  File "C:\stable-diffusion-webui-master\venv\lib\site-packages\anyio\_backends\_asyncio.py", line 937, in run_sync_in_worker_thread
    return await future
  File "C:\stable-diffusion-webui-master\venv\lib\site-packages\anyio\_backends\_asyncio.py", line 867, in run
    result = context.run(func, *args)
  File "C:\stable-diffusion-webui-master\modules\ui.py", line 1662, in <lambda>
    fn=lambda value, k=k: run_settings_single(value, key=k),
  File "C:\stable-diffusion-webui-master\modules\ui.py", line 1504, in run_settings_single
    opts.data_labels[key].onchange()
  File "C:\stable-diffusion-webui-master\webui.py", line 41, in f
    res = func(*args, **kwargs)
  File "C:\stable-diffusion-webui-master\webui.py", line 83, in <lambda>
    shared.opts.onchange("sd_model_checkpoint", wrap_queued_call(lambda: modules.sd_models.reload_model_weights()))
  File "C:\stable-diffusion-webui-master\modules\sd_models.py", line 291, in reload_model_weights
    load_model_weights(sd_model, checkpoint_info)
  File "C:\stable-diffusion-webui-master\modules\sd_models.py", line 182, in load_model_weights
    model.load_state_dict(sd, strict=False)
  File "C:\stable-diffusion-webui-master\venv\lib\site-packages\torch\nn\modules\module.py", line 1604, in load_state_dict
    raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for LatentDiffusion:
        size mismatch for model.diffusion_model.input_blocks.1.1.proj_in.weight: copying a param with shape torch.Size([320, 320]) from checkpoint, the shape in current model is torch.Size([320, 320, 1, 1]).
        size mismatch for model.diffusion_model.input_blocks.1.1.transformer_blocks.0.attn2.to_k.weight: copying a param with shape torch.Size([320, 1024]) from checkpoint, the shape in current model is torch.Size([320, 768]).
        size mismatch for model.diffusion_model.input_blocks.1.1.transformer_blocks.0.attn2.to_v.weight: copying a param with shape torch.Size([320, 1024]) from checkpoint, the shape in current model is torch.Size([320, 768]).
        size mismatch for model.diffusion_model.input_blocks.1.1.proj_out.weight: copying a param with shape torch.Size([320, 320]) from checkpoint, the shape in current model is torch.Size([320, 320, 1, 1]).
        size mismatch for model.diffusion_model.input_blocks.2.1.proj_in.weight: copying a param with shape torch.Size([320, 320]) from checkpoint, the shape in current model is torch.Size([320, 320, 1, 1]).
        size mismatch for model.diffusion_model.input_blocks.2.1.transformer_blocks.0.attn2.to_k.weight: copying a param with shape torch.Size([320, 1024]) from checkpoint, the shape in current model is torch.Size([320, 768]).
        size mismatch for model.diffusion_model.input_blocks.2.1.transformer_blocks.0.attn2.to_v.weight: copying a param with shape torch.Size([320, 1024]) from checkpoint, the shape in current model is torch.Size([320, 768]).
        size mismatch for model.diffusion_model.input_blocks.2.1.proj_out.weight: copying a param with shape torch.Size([320, 320]) from checkpoint, the shape in current model is torch.Size([320, 320, 1, 1]).
        size mismatch for model.diffusion_model.input_blocks.4.1.proj_in.weight: copying a param with shape torch.Size([640, 640]) from checkpoint, the shape in current model is torch.Size([640, 640, 1, 1]).
        size mismatch for model.diffusion_model.input_blocks.4.1.transformer_blocks.0.attn2.to_k.weight: copying a param with shape torch.Size([640, 1024]) from checkpoint, the shape in current model is torch.Size([640, 768]).
        size mismatch for model.diffusion_model.input_blocks.4.1.transformer_blocks.0.attn2.to_v.weight: copying a param with shape torch.Size([640, 1024]) from checkpoint, the shape in current model is torch.Size([640, 768]).
        size mismatch for model.diffusion_model.input_blocks.4.1.proj_out.weight: copying a param with shape torch.Size([640, 640]) from checkpoint, the shape in current model is torch.Size([640, 640, 1, 1]).
        size mismatch for model.diffusion_model.input_blocks.5.1.proj_in.weight: copying a param with shape torch.Size([640, 640]) from checkpoint, the shape in current model is torch.Size([640, 640, 1, 1]).
        size mismatch for model.diffusion_model.input_blocks.5.1.transformer_blocks.0.attn2.to_k.weight: copying a param with shape torch.Size([640, 1024]) from checkpoint, the shape in current model is torch.Size([640, 768]).
        size mismatch for model.diffusion_model.input_blocks.5.1.transformer_blocks.0.attn2.to_v.weight: copying a param with shape torch.Size([640, 1024]) from checkpoint, the shape in current model is torch.Size([640, 768]).
        size mismatch for model.diffusion_model.input_blocks.5.1.proj_out.weight: copying a param with shape torch.Size([640, 640]) from checkpoint, the shape in current model is torch.Size([640, 640, 1, 1]).
        size mismatch for model.diffusion_model.input_blocks.7.1.proj_in.weight: copying a param with shape torch.Size([1280, 1280]) from checkpoint, the shape in current model is torch.Size([1280, 1280, 1, 1]).
        size mismatch for model.diffusion_model.input_blocks.7.1.transformer_blocks.0.attn2.to_k.weight: copying a param with shape torch.Size([1280, 1024]) from checkpoint, the shape in current model is torch.Size([1280, 768]).
        size mismatch for model.diffusion_model.input_blocks.7.1.transformer_blocks.0.attn2.to_v.weight: copying a param with shape torch.Size([1280, 1024]) from checkpoint, the shape in current model is torch.Size([1280, 768]).
        size mismatch for model.diffusion_model.input_blocks.7.1.proj_out.weight: copying a param with shape torch.Size([1280, 1280]) from checkpoint, the shape in current model is torch.Size([1280, 1280, 1, 1]).
        size mismatch for model.diffusion_model.input_blocks.8.1.proj_in.weight: copying a param with shape torch.Size([1280, 1280]) from checkpoint, the shape in current model is torch.Size([1280, 1280, 1, 1]).
        size mismatch for model.diffusion_model.input_blocks.8.1.transformer_blocks.0.attn2.to_k.weight: copying a param with shape torch.Size([1280, 1024]) from checkpoint, the shape in current model is torch.Size([1280, 768]).
        size mismatch for model.diffusion_model.input_blocks.8.1.transformer_blocks.0.attn2.to_v.weight: copying a param with shape torch.Size([1280, 1024]) from checkpoint, the shape in current model is torch.Size([1280, 768]).
        size mismatch for model.diffusion_model.input_blocks.8.1.proj_out.weight: copying a param with shape torch.Size([1280, 1280]) from checkpoint, the shape in current model is torch.Size([1280, 1280, 1, 1]).
        size mismatch for model.diffusion_model.middle_block.1.proj_in.weight: copying a param with shape torch.Size([1280, 1280]) from checkpoint, the shape in current model is torch.Size([1280, 1280, 1, 1]).
        size mismatch for model.diffusion_model.middle_block.1.transformer_blocks.0.attn2.to_k.weight: copying a param with shape torch.Size([1280, 1024]) from checkpoint, the shape in current model is torch.Size([1280, 768]).
        size mismatch for model.diffusion_model.middle_block.1.transformer_blocks.0.attn2.to_v.weight: copying a param with shape torch.Size([1280, 1024]) from checkpoint, the shape in current model is torch.Size([1280, 768]).
        size mismatch for model.diffusion_model.middle_block.1.proj_out.weight: copying a param with shape torch.Size([1280, 1280]) from checkpoint, the shape in current model is torch.Size([1280, 1280, 1, 1]).
        size mismatch for model.diffusion_model.output_blocks.3.1.proj_in.weight: copying a param with shape torch.Size([1280, 1280]) from checkpoint, the shape in current model is torch.Size([1280, 1280, 1, 1]).
        size mismatch for model.diffusion_model.output_blocks.3.1.transformer_blocks.0.attn2.to_k.weight: copying a param with shape torch.Size([1280, 1024]) from checkpoint, the shape in current model is torch.Size([1280, 768]).
        size mismatch for model.diffusion_model.output_blocks.3.1.transformer_blocks.0.attn2.to_v.weight: copying a param with shape torch.Size([1280, 1024]) from checkpoint, the shape in current model is torch.Size([1280, 768]).
        size mismatch for model.diffusion_model.output_blocks.3.1.proj_out.weight: copying a param with shape torch.Size([1280, 1280]) from checkpoint, the shape in current model is torch.Size([1280, 1280, 1, 1]).
        size mismatch for model.diffusion_model.output_blocks.4.1.proj_in.weight: copying a param with shape torch.Size([1280, 1280]) from checkpoint, the shape in current model is torch.Size([1280, 1280, 1, 1]).
        size mismatch for model.diffusion_model.output_blocks.4.1.transformer_blocks.0.attn2.to_k.weight: copying a param with shape torch.Size([1280, 1024]) from checkpoint, the shape in current model is torch.Size([1280, 768]).
        size mismatch for model.diffusion_model.output_blocks.4.1.transformer_blocks.0.attn2.to_v.weight: copying a param with shape torch.Size([1280, 1024]) from checkpoint, the shape in current model is torch.Size([1280, 768]).
        size mismatch for model.diffusion_model.output_blocks.4.1.proj_out.weight: copying a param with shape torch.Size([1280, 1280]) from checkpoint, the shape in current model is torch.Size([1280, 1280, 1, 1]).
        size mismatch for model.diffusion_model.output_blocks.5.1.proj_in.weight: copying a param with shape torch.Size([1280, 1280]) from checkpoint, the shape in current model is torch.Size([1280, 1280, 1, 1]).
        size mismatch for model.diffusion_model.output_blocks.5.1.transformer_blocks.0.attn2.to_k.weight: copying a param with shape torch.Size([1280, 1024]) from checkpoint, the shape in current model is torch.Size([1280, 768]).
        size mismatch for model.diffusion_model.output_blocks.5.1.transformer_blocks.0.attn2.to_v.weight: copying a param with shape torch.Size([1280, 1024]) from checkpoint, the shape in current model is torch.Size([1280, 768]).
        size mismatch for model.diffusion_model.output_blocks.5.1.proj_out.weight: copying a param with shape torch.Size([1280, 1280]) from checkpoint, the shape in current model is torch.Size([1280, 1280, 1, 1]).
        size mismatch for model.diffusion_model.output_blocks.6.1.proj_in.weight: copying a param with shape torch.Size([640, 640]) from checkpoint, the shape in current model is torch.Size([640, 640, 1, 1]).
        size mismatch for model.diffusion_model.output_blocks.6.1.transformer_blocks.0.attn2.to_k.weight: copying a param with shape torch.Size([640, 1024]) from checkpoint, the shape in current model is torch.Size([640, 768]).
        size mismatch for model.diffusion_model.output_blocks.6.1.transformer_blocks.0.attn2.to_v.weight: copying a param with shape torch.Size([640, 1024]) from checkpoint, the shape in current model is torch.Size([640, 768]).
        size mismatch for model.diffusion_model.output_blocks.6.1.proj_out.weight: copying a param with shape torch.Size([640, 640]) from checkpoint, the shape in current model is torch.Size([640, 640, 1, 1]).
        size mismatch for model.diffusion_model.output_blocks.7.1.proj_in.weight: copying a param with shape torch.Size([640, 640]) from checkpoint, the shape in current model is torch.Size([640, 640, 1, 1]).
        size mismatch for model.diffusion_model.output_blocks.7.1.transformer_blocks.0.attn2.to_k.weight: copying a param with shape torch.Size([640, 1024]) from checkpoint, the shape in current model is torch.Size([640, 768]).
        size mismatch for model.diffusion_model.output_blocks.7.1.transformer_blocks.0.attn2.to_v.weight: copying a param with shape torch.Size([640, 1024]) from checkpoint, the shape in current model is torch.Size([640, 768]).
        size mismatch for model.diffusion_model.output_blocks.7.1.proj_out.weight: copying a param with shape torch.Size([640, 640]) from checkpoint, the shape in current model is torch.Size([640, 640, 1, 1]).
        size mismatch for model.diffusion_model.output_blocks.8.1.proj_in.weight: copying a param with shape torch.Size([640, 640]) from checkpoint, the shape in current model is torch.Size([640, 640, 1, 1]).
        size mismatch for model.diffusion_model.output_blocks.8.1.transformer_blocks.0.attn2.to_k.weight: copying a param with shape torch.Size([640, 1024]) from checkpoint, the shape in current model is torch.Size([640, 768]).
        size mismatch for model.diffusion_model.output_blocks.8.1.transformer_blocks.0.attn2.to_v.weight: copying a param with shape torch.Size([640, 1024]) from checkpoint, the shape in current model is torch.Size([640, 768]).
        size mismatch for model.diffusion_model.output_blocks.8.1.proj_out.weight: copying a param with shape torch.Size([640, 640]) from checkpoint, the shape in current model is torch.Size([640, 640, 1, 1]).
        size mismatch for model.diffusion_model.output_blocks.9.1.proj_in.weight: copying a param with shape torch.Size([320, 320]) from checkpoint, the shape in current model is torch.Size([320, 320, 1, 1]).
        size mismatch for model.diffusion_model.output_blocks.9.1.transformer_blocks.0.attn2.to_k.weight: copying a param with shape torch.Size([320, 1024]) from checkpoint, the shape in current model is torch.Size([320, 768]).
        size mismatch for model.diffusion_model.output_blocks.9.1.transformer_blocks.0.attn2.to_v.weight: copying a param with shape torch.Size([320, 1024]) from checkpoint, the shape in current model is torch.Size([320, 768]).
        size mismatch for model.diffusion_model.output_blocks.9.1.proj_out.weight: copying a param with shape torch.Size([320, 320]) from checkpoint, the shape in current model is torch.Size([320, 320, 1, 1]).
        size mismatch for model.diffusion_model.output_blocks.10.1.proj_in.weight: copying a param with shape torch.Size([320, 320]) from checkpoint, the shape in current model is torch.Size([320, 320, 1, 1]).
        size mismatch for model.diffusion_model.output_blocks.10.1.transformer_blocks.0.attn2.to_k.weight: copying a param with shape torch.Size([320, 1024]) from checkpoint, the shape in current model is torch.Size([320, 768]).
        size mismatch for model.diffusion_model.output_blocks.10.1.transformer_blocks.0.attn2.to_v.weight: copying a param with shape torch.Size([320, 1024]) from checkpoint, the shape in current model is torch.Size([320, 768]).
        size mismatch for model.diffusion_model.output_blocks.10.1.proj_out.weight: copying a param with shape torch.Size([320, 320]) from checkpoint, the shape in current model is torch.Size([320, 320, 1, 1]).
        size mismatch for model.diffusion_model.output_blocks.11.1.proj_in.weight: copying a param with shape torch.Size([320, 320]) from checkpoint, the shape in current model is torch.Size([320, 320, 1, 1]).
        size mismatch for model.diffusion_model.output_blocks.11.1.transformer_blocks.0.attn2.to_k.weight: copying a param with shape torch.Size([320, 1024]) from checkpoint, the shape in current model is torch.Size([320, 768]).
        size mismatch for model.diffusion_model.output_blocks.11.1.transformer_blocks.0.attn2.to_v.weight: copying a param with shape torch.Size([320, 1024]) from checkpoint, the shape in current model is torch.Size([320, 768]).
        size mismatch for model.diffusion_model.output_blocks.11.1.proj_out.weight: copying a param with shape torch.Size([320, 320]) from checkpoint, the shape in current model is torch.Size([320, 320, 1, 1]).

Omegastick commented 1 year ago

It gives an error trying to load old Textual Inversion embeddings with the new models, but that can't be helped.

I'm 99% sure old embeddings wont work with 2.0 anyway, because of the retrained text encoder. You'll probably get nonsense.

shadinx2 commented 1 year ago

Can I presume the models are so fundamentally different that gen 1 and gen 2 can't be merged?

Edit:Also it wouldn't be possible to just merge the weights from gen 1 and gen 2 and keep the architecture, text encoder, etc from gen 2? Would it give nonsense, if possible?

CarlKenner commented 1 year ago

Thanks for the link. How do we run it? I tried running user2's branch, but I couldn't get it working. Appreciate your help.

It's meant to be run locally. I'm assuming you have installed the misleadingly named "SUPER Stable diffusion 2.0" (actually 1.4) according to this video's instructions from months ago: https://youtu.be/vg8-NSbaWZI or something similar that uses Automatic1111 on your own computer.

You'll need to download either 768-v-ema.ckpt (for generating images natively in 768x768) or 512-base-ema.ckpt (for generating images in 512x512) or 512-inpainting-ema.ckpt (for inpainting) and put it in your A:\AI\Super SD 2.0\stable-diffusion-webui\models\Stable-diffusion folder (or whatever you called it). Don't rename the start of the file, it needs the 512- or 768-v- at the start, and needs the word inpainting somewhere in the middle if it's an inpainting model. Or you can call it whatever you want, as long as you put the appropriate .yaml file next to it with the same name.

Go into your A:\AI\Super SD 2.0\stable-diffusion-webui\ folder and use git to create a new branch (so your master branch will work when the real Automatic1111 version comes out) and switch to that branch. Then do a git pull on my https://github.com/CarlKenner/stable-diffusion-webui.git remote, with the branch set to dev2-carl

Then run webui-user.bat and wait for ages while it downloads and installs the modules it needs. And copy the IP address it gives you into your web browser. Then just use it like normal, and load whatever models you want from any Stable Diffusion version.

Of course the real version will come out soon, so you'll need to switch back to your master branch at some point in the future and do a git pull.

CarlKenner commented 1 year ago

Can I presume the models are so fundamentally different that gen 1 and gen 2 can't be merged?

Edit:Also it wouldn't be possible to just merge the weights from gen 1 and gen 2 and keep the architecture, text encoder, etc from gen 2? Would it give nonsense, if possible?

I don't think that's possible because the weights are based on the inputs from the text encoder, which would be totally different. Words would end up having random meanings.

acheong08 commented 1 year ago

@aniketgore The issue for you might be due to the fact that the commit hash wasn't changed for launch.py, only 'webui-user.bat'. I forked their repo and added the minor changes https://github.com/acheong08/stable-diffusion-webui/tree/SDV2.0

uservar commented 1 year ago

There's a small detail in the implementation relating to the open_clip tokenizer that we were doing differently which should now be fixed with this commit: https://github.com/uservar/stable-diffusion-webui/commit/49df7c9aca39bccec623dd54ae33fb6963e41464

from: https://github.com/mlfoundations/open_clip/blob/9d31b2ec4df6d8228f370ff20c8267ec6ba39383/src/open_clip/tokenizer.py#L174-L183

    sot_token = _tokenizer.encoder["<start_of_text>"]
    eot_token = _tokenizer.encoder["<end_of_text>"]
    all_tokens = [[sot_token] + _tokenizer.encode(text) + [eot_token] for text in texts]
    result = torch.zeros(len(all_tokens), context_length, dtype=torch.long)

    for i, tokens in enumerate(all_tokens):
        if len(tokens) > context_length:
            tokens = tokens[:context_length]  # Truncate
            tokens[-1] = eot_token
        result[i, :len(tokens)] = torch.tensor(tokens)

The tokens should be something like this: [start_of_text, token1, token2, token3, end_of_text, 0, 0, 0, ..., 0] Instead of like this: [start_of_text, token1, token2, token3, end_of_text, end_of_text, end_of_text, end_of_text, ..., end_of_text]

There's probably still a few small bugs remaining but things are looking good in terms of supporting the 2.0 models so far.

ProGamerGov commented 1 year ago

I feel like we need a better way of determining model architecture that doesn't rely on the filename.

Razunter commented 1 year ago

Don't rename the start of the file, it needs the 512- or 768-v- at the start, and needs the word inpainting somewhere in the middle if it's an inpainting model. Or you can call it whatever you want, as long as you put the appropriate .yaml file next to it with the same name.

Doesn't work when in a subfolder...

MrCheeze commented 1 year ago

@ carl/uservar... v-model detection is broken right now, can't use .get() in the way it's currently used. The change below fixes it.

CypherQube commented 1 year ago

@CarlKenner Your branch just gives me mostly noise using 768 weights (& yes I am generating at 768x768) the 512 weights work fine though. uservar's repo is working fine with both

Echolink50 commented 1 year ago

@CarlKenner Your branch just gives me mostly noise using 768 weights (& yes I am generating at 768x768) the 512 weights work fine though. uservar's repo is working fine with both

Same here. Its noise sometimes and a terrible image other times.

MrCheeze commented 1 year ago

@CarlKenner Your branch just gives me mostly noise using 768 weights (& yes I am generating at 768x768) the 512 weights work fine though. uservar's repo is working fine with both

Same here. Its noise sometimes and a terrible image other times.

if you're getting totally unusable images, it's because of the buggy v-objective model detection I mentioned above, which is fixed on uservar/dev2 now.

Echolink50 commented 1 year ago

@CarlKenner Your branch just gives me mostly noise using 768 weights (& yes I am generating at 768x768) the 512 weights work fine though. uservar's repo is working fine with both

Same here. Its noise sometimes and a terrible image other times.

if you're getting totally unusable images, it's because of the buggy v-objective model detection I mentioned above, which is fixed on uservar/dev2 now.

Thanks. Is uservar/dev2 the one to run locally? Also this is a newb question but how do I clone/pull/merge/overwrite the carlkenner branch without having to redownload all the dependencies and such

arpitest commented 1 year ago

if you're getting totally unusable images, it's because of the buggy v-objective model detection I mentioned above, which is fixed on uservar/dev2 now.

thanks, uservar's version work much better, just switched to that. wonder when some of these will be merged by @AUTOMATIC1111 to the main repo?

arpitest commented 1 year ago

is it possible to fix upscaler (x4-upscaler-ema.ckpt) to work in img2img ? now it throws error:

File "/home/arpi/stable/GUI/v2/uservar/stable-diffusion-webui/modules/sd_samplers.py", line 437, in sample_img2img xi = x + noise * sigma_sched[0] RuntimeError: The size of tensor a (128) must match the size of tensor b (64) at non-singleton dimension 3

fractal-fumbler commented 1 year ago

it's better to create issue in appropriate repo of webui you are using atm.

is it possible to fix upscaler (x4-upscaler-ema.ckpt) to work in img2img ? now it throws error:

ghost commented 1 year ago

As bad as 2.0 is (and yeah that's my opinion) we could do with official support for it in automatic1111 instead of having to use forks.

Just please also support 1.5

If there is a way to use the older clip with 2.0 in stable diffusion (I don't understand that but apparently that will somehow allow use of missing content...yeah really don't understand that) then that would be a bonus to have

CypherQube commented 1 year ago

@CarlKenner Your branch just gives me mostly noise using 768 weights (& yes I am generating at 768x768) the 512 weights work fine though. uservar's repo is working fine with both

Same here. Its noise sometimes and a terrible image other times.

if you're getting totally unusable images, it's because of the buggy v-objective model detection I mentioned above, which is fixed on uservar/dev2 now.

Yeah that's kinda what I was saying, I think usevar's /dev2 is the way to go atm

junebug12851 commented 1 year ago

As bad as 2.0 is (and yeah that's my opinion) we could do with official support for it in automatic1111 instead of having to use forks.

Just please also support 1.5

If there is a way to use the older clip with 2.0 in stable diffusion (I don't understand that but apparently that will somehow allow use of missing content...yeah really don't understand that) then that would be a bonus to have

I agree, in the settings i would like to switch between 1.5 and 2.0 like the whole thing including clip if possible, not just the checkpoint file or a single aspect of 2.0. i know this may not be possible though and i understand if it isn't. I hope this can be made to happen.

ye7iaserag commented 1 year ago

Is someone working on this? Or is there a working PR? I noticed AUTOMATIC1111 has been away for the past 4 days

junebug12851 commented 1 year ago

I'm sure with the 2.0 update everyones just real busy figuring out how to bring this into the WebUI since I've heard it's a pretty breaking update. I'd give it a few days but it sounds like good progress is being made towards a PR from what I can gather.

CarlKenner commented 1 year ago

Thanks. Is uservar/dev2 the one to run locally? Also this is a newb question but how do I clone/pull/merge/overwrite the carlkenner branch without having to redownload all the dependencies and such

The dependencies are the same, as is most of the code, so you can just pull uservar/dev2 on top of the existing branch if you want. Anyway, I just updated my branch with his changes, so you could just pull my branch again.

Is someone working on this?

Lots of people are working on it.

Or is there a working PR?

I figured my code wasn't quite ready for a pull request, so I didn't create one. But currently, my dev2-carl branch or the uservar/dev2 branch is mostly working. Although it does include a few of uservar's changes that aren't directly related to SD 2.0 support, so maybe they should be separated into another branch before anyone makes a pull request.

I don't know whether it's better to merge partial support in as soon as it's ready or wait for full support before anyone does a pull request.

I noticed AUTOMATIC1111 has been away for the past 4 days

I guess there's no rush for us to make a pull request then.

is it possible to fix upscaler (x4-upscaler-ema.ckpt) to work in img2img ?

I'm sure it's possible, I just haven't had a chance to work on the upscaler much. When I first tested generating an image with the upscaler, I got an out of RAM (not VRAM) error.

I feel like we need a better way of determining model architecture that doesn't rely on the filename.

That's currently how Automatic1111 handles determining model architecture for the inpainting model, so it's not new. Another clue we could use for detecting 2.0 models is the size, they all seem to be > 5,000,000,000 bytes but less than 6,000,000,000. That doesn't help narrow down which of the five 2.0 architectures it is though. Or we could look at the hashes. I don't know how to actually load and read the checkpoint file to see what architecture it contains.

It's also possible (in theory, I didn't test it) to put the appropriate .yaml file beside it with the same name.

As bad as 2.0 is (and yeah that's my opinion) we could do with official support for it in automatic1111 instead of having to use forks.

These aren't really forks. There are a few other unrelated bug fixes in the uservar/dev2 branch, but mostly we're just coding support into the latest version of automatic1111 so it can become a pull request.

Just please also support 1.5

It does.

If there is a way to use the older clip with 2.0 in stable diffusion (I don't understand that but apparently that will somehow allow use of missing content...yeah really don't understand that) then that would be a bonus to have

I don't think so. The new model is trained on the outputs of the new clip, if you fed in the old clip's outputs you'd just get randomly shuffled meanings of words.

The better way to access the missing content that is supposedly in the model but not in CLIP would be to train an Aesthetic Gradient (or possibly a Textual Inversion?). Unfortunately, last time I checked, I didn't have enough VRAM to use Aesthetic Gradients, even though 4GB is enough to train them. So someone else might need to get Aesthetic Gradients working in 2.0.

It's also theoretically possible to retrain the text encoder separately from the model itself, I think a Chinese team did that to make a version that understands Chinese but still generates the same images.

But the problem isn't just the new CLIP. Don't get your hopes up that all the styles and celebrities are still fully learned somewhere in the latent space just waiting to be discovered somehow. Maybe they are, but maybe it never learned concepts it doesn't have words for. I don't know.

CypherQube commented 1 year ago

@CarlKenner What about support for CLIP guidance? So you have the option to use proper CLIP guidance instead of the frozen CLIP models is AUTO1111 UI. Midjourney has proper CLIP guidance, so does DreamStudio & someone on the Stability discord was also running v2.0 with proper ClIP guidance & the results seemed much better. Why can't that be supported?

ye7iaserag commented 1 year ago

@CarlKenner My proposals for determining v1 vs v2 models:

Separated folders for v1 and v2 under ./models or ./models/stable-diffusion.
Using file hashes. (Will cause hassles with new additions but easy to maintain)
Including inference yaml files per checkpoint with the same name to be loaded with checkpoints.

Solutions dependent on name and size are not future proof and can cause problems later since we already have some 7GB checkpoints for v1.5 and being dependent on naming means people can not rename checkpoints to organize them

ProGamerGov commented 1 year ago

Using file hashes. (Will cause hassles with new additions but easy to maintain)

This will be a problem for custom trained models like DreamBooth and finetunes.

We're using Python, so maybe a try except catching block would be a better option. We could then save what arch each model uses, and use that after reloading.

We also need to plan for future models having different architectures as well, so that its easy to add support for them.

Daviljoe193 commented 1 year ago

@CarlKenner My proposals for determining v1 vs v2 models:

Separated folders for v1 and v2 under ./models or ./models/stable-diffusion.

Using file hashes. (Will cause hassles with new additions but easy to maintain)

Including inference yaml files per checkpoint with the same name to be loaded with checkpoints.

Solutions dependent on name and size are not future proof and can cause problems later since we already have some 7GB checkpoints for v1.5 and being dependent on naming means people can not rename checkpoints to organize them

I'm a fan of the simplest solution, and having a models/stable-diffusion-v1 and models/stable-diffusion-v2 directory (Along with detection/support for the old models dir, with warnings printed informing the user of the change) sounds like the most bulletproof solution at the moment.

junebug12851 commented 1 year ago

Why not just leave it up to the user:

Have a default selected version (v1 or v2)
Any model not known just uses the default one.

Provide a way for users to classify it. For example, a folder like you said, such as a models/v1 or models/v2. Or maybe a file extension such as .v1.ckpt .v2.ckpt then that can automatically be used.

The same thing happens with vae files, without the vae file present under the same name, the WebUI can use a default one such as none or a particular one, but if one is present then you can optionally use that.

It just provides a lot of flexibility on everyone, this way:

Gives you the fullest freedom to organize how you want
Doesn't break any existing organization or naming
Allows you to just drop-in the update and theoretically have no breakage

I'd even default it to v2 only on new installations, and keeping it v1 if it's an existing installation to further make the update very seamless.

ProGamerGov commented 1 year ago

We are using Python, where its easier to ask for forgiven then it is to ask for permission. If there's no way to know what the model architecture is then we can simply do this:

def load_model(path):
    model = None
    try:
        model = load_model_with_v1(path)
    except:
        pass
    try:
        model = load_model_with_v1_inpainting(path)
    except:
        pass
    try:
        model = load_model_with_v2_512x512(path)
    except:
        pass
    try:
        model = load_model_with_v2_768x768(path)
    except:
        pass
    try:
        model = load_model_with_v2_depth(path)
    except:
        pass
    assert model is not None, "Model architecture not recognized"
    return model

We don't need to require manually renaming the files or anything like that. Try & Except blocks seem like the most user friendly way to do this, especially for those with very little technical skills.

ye7iaserag commented 1 year ago

@ProGamerGov Wouldn't that in some cases load the model to memory then need to flush it and so on until it gets to the bottom of the try/except? What would we do in a future where you have 100+ different types of models? How long would someone need to wait to get a model loaded in such case?

junebug12851 commented 1 year ago

The solution I proposed above doesn't require renaming or moving, it's something optional you can do, and would otherwise work out of the box for existing installations.

A try/catch would really slow things down as it would attempt to load the whole model as v1 and then v2 and onward every single time. Also if they release a v3 later on you'd have to add to the try/catch so it's not flexible to the future and would slow down more.

ye7iaserag commented 1 year ago

Another solution would be to allow webui to read the ./models directory recursively This way you can group multiple checkpoints with a single inference yaml file But if so the separate directories solution would be very close to that

ProGamerGov commented 1 year ago

The try/catch blocks would only be needed the first time you run the model. We can store the arch, and then use that when loading it.

ProGamerGov commented 1 year ago

Its also possible to get the weight names without fully loading the model, so we could simply match the list of weights to their architecture:

model = torch.load(path)

weights = list(model["weights"]) # simple example

model_full = match_weight_names_to_arch(weights)
model_full.load(model)

CarlKenner commented 1 year ago

Separated folders for v1 and v2 under ./models or ./models/stable-diffusion.

I'm a fan of the simplest solution, and having a models/stable-diffusion-v1 and models/stable-diffusion-v2 directory (Along with detection/support for the old models dir, with warnings printed informing the user of the change) sounds like the most bulletproof solution at the moment.

There are more than 2 different model architectures though. A lot more.

v1
v1 inpainting
v2 base
v2 v-prediction
v2 inpainting
v2 depth
v2 upscaler

I may have accidentally implemented this feature already though, by being bad at python. 😂 Try making a subfolder called "768-v-models" and putting trained 768-v- models in there with random names. Or a subfolder called "v1" and putting v1 models in there.

We are using Python, where its easier to ask for forgiven then it is to ask for permission.

Loading 5GB files isn't easy, and it takes like 10 minutes. And I'm not sure it would even know if it got it wrong.

Also, the current implementation expects to know what model type each model is when it makes the list of models (even though I don't think it uses that information).

But it's an interesting idea for later.

Why not just leave it up to the user

Users are idiots (including me). And would just wonder why Stable Diffusion isn't working.

Have a default selected version (v1 or v2)

Any model not known just uses the default one.

That's not a terrible idea though.

Provide a way for users to classify it.

There already is one. Put the appropriate .yaml file next to it with the same name. Currently, that may not work with inpainting models that don't include the word "inpainting" though.

Or maybe a file extension such as .v1.ckpt .v2.ckpt then that can automatically be used.

We're already doing that, just at the start of the filename. eg. you can call your trained model "768-v-Christina Hendricks.ckpt"

junebug12851 commented 1 year ago

The try/catch blocks would only be needed the first time you run the model. We can store the arch, and then use that when loading it.

This solution does work, you can store the model hashes in a lookup file to reference and let the AI determine the version on first run. But:

I still think it's a lot of loading and unloading and not future flexible
If you have a hundred versions in the future, then you have a hundred try/catch
What if some models provide a false positive (if that's a thing that can happen) as one thing or a false negative. For example, many errors can come up such as filesystem errors, and in those cases it would move down the list and assume it's not that model version because it's looking for any error as a sign of whether it's v1 or v2.
It's not flexible, i'm not saying this will happen but it's the open source world so who knows, if anybody creates their own version separate from the official ones that's separately maintained or somehow specialized, then this would fall apart on that. It's just an edge case i'd want to think about.
Stable Diffusion already has a system in place for VAE files and using existing systems people are familiar with translate better to the end user

You solution is good though and does abstract away things from the end user so I think both methods are great ideas

CarlKenner commented 1 year ago

Stable Diffusion already has a system in place for VAE files and using existing systems people are familiar with translate better to the end user

Speaking of VAEs, I think Stable Diffusion 2.0 comes with some, has anyone experimented with them?

CarlKenner commented 1 year ago

By the way, please don't think I should be in charge of implementing any of these features. I'm not very familiar with how Stable Diffusion works, and barely know how to program in Python. Plus I'm currently sick, and haven't been getting much sleep. So if anyone wants to have a go at implementing or fixing things themselves, be my guest.

MrCheeze commented 1 year ago

Its also possible to get the weight names without fully loading the model, so we could simply match the list of weights to their architecture:
model = torch.load(path)

weights = list(model["weights"]) # simple example

model_full = match_weight_names_to_arch(weights)
model_full.load(model)

I personally think this is the correct approach. Although it does require a bit of reordering the code, since right now the code expects to know what config it's using before loading the weights, not after.

I also think detecting architecture by checkpoint filename is OK as a temporary stopgap solution - I wouldn't hold off on merging 2.0 support just because arch detection by ckpt contents isn't implemented yet.

junebug12851 commented 1 year ago

There are more than 2 different model architectures though. A lot more.

In v1, if it's not pertaining to text to image, then it doesn't go in my models folder. In v2, if you have downloaded models which are not text to image but other types, would it be wise to dump them all in the models folder because this isn't how v1 is or at least to my knowledge.

Users are idiots (including me). And would just wonder why Stable Diffusion isn't working.

Having that solution will work for precisely this case, the idea is to not make the user do anything special on up-date. Everything just works, everything is seamless and requires no tinkering with settings, renaming files, or extra reading.

Put the appropriate .yaml file next to it with the same name.

But yaml files are configuration files, i feel that'd be more complicated than just an optional file name suffix or special directory inside models

We're already doing that, just at the start of the filename.

But the start of the filename is the first thing people look at to see what model their using and it won't be sorted right

CypherQube commented 1 year ago

Stable Diffusion already has a system in place for VAE files and using existing systems people are familiar with translate better to the end user

Speaking of VAEs, I think Stable Diffusion 2.0 comes with some, has anyone experimented with them?

SD v2.0 doesn't come with VAEs but there were two new ones released about a month ago & I have been testing v2.0 with them. In my opinion they work slightly better with v2.0 than the default

CarlKenner commented 1 year ago

In v1, if it's not pertaining to text to image, then it doesn't go in my models folder. In v2, if you have downloaded models which are not text to image but other types, would it be wise to dump them all in the models folder because this isn't how v1 is or at least to my knowledge.

All those models I listed are text to image.

Having that solution will work for precisely this case, the idea is to not make the user do anything special on up-date. Everything just works, everything is seamless and requires no tinkering with settings, renaming files, or extra reading.

I don't see how.

But yaml files are configuration files

yaml files are essentially part of the model. Frankly, it would make much more sense for models to be published and distributed with their corresponding yaml file beside them. Anyway, that feature already existed, I didn't add it.

CarlKenner commented 1 year ago

SD v2.0 doesn't come with VAEs but there were two new ones released about a month ago & I have been testing v2.0 with them. In my opinion they work slightly better with v2.0 than the default

Are you sure? There are VAE folders in all the 2.0 models on huggingface.

CypherQube commented 1 year ago

SD v2.0 doesn't come with VAEs but there were two new ones released about a month ago & I have been testing v2.0 with them. In my opinion they work slightly better with v2.0 than the default

Are you sure? There are VAE folders in all the 2.0 models on huggingface.

They are .bin files in 2.0 VAE folders, can we use .bin files ? The other month old VAEs are .ckpt

patrickmac110 commented 1 year ago

I keep getting errors on uservar's webui that's preventing me from even launching the ui:

it gets to this point then throws all this stuff... making attention of type 'vanilla' with 512 in_channels Traceback (most recent call last): File "A:\Desktop\00 AI Images\stable-diffusion-webui\launch.py", line 259, in start() File "A:\Desktop\00 AI Images\stable-diffusion-webui\launch.py", line 254, in start webui.webui() File "A:\Desktop\00 AI Images\stable-diffusion-webui\webui.py", line 150, in webui initialize() File "A:\Desktop\00 AI Images\stable-diffusion-webui\webui.py", line 85, in initialize modules.sd_models.load_model() File "A:\Desktop\00 AI Images\stable-diffusion-webui\modules\sd_models.py", line 286, in load_model sd_model = instantiate_from_config(sd_config.model) File "A:\Desktop\00 AI Images\stable-diffusion-webui\repositories\stable-diffusion\ldm\util.py", line 79, in instantiate_from_config return get_obj_from_str(config["target"])(config.get("params", dict())) File "A:\Desktop\00 AI Images\stable-diffusion-webui\repositories\stable-diffusion\ldm\models\diffusion\ddpm.py", line 563, in init self.instantiate_cond_stage(cond_stage_config) File "A:\Desktop\00 AI Images\stable-diffusion-webui\repositories\stable-diffusion\ldm\models\diffusion\ddpm.py", line 630, in instantiate_cond_stage model = instantiate_from_config(config) File "A:\Desktop\00 AI Images\stable-diffusion-webui\repositories\stable-diffusion\ldm\util.py", line 79, in instantiate_from_config return get_obj_from_str(config["target"])(config.get("params", dict())) File "A:\Desktop\00 AI Images\stable-diffusion-webui\repositories\stable-diffusion\ldm\modules\encoders\modules.py", line 147, in init model, , = open_clip.create_model_and_transforms(arch, device=torch.device('cpu'), pretrained=version) File "A:\Desktop\00 AI Images\stable-diffusion-webui\venv\lib\site-packages\open_clip\factory.py", line 201, in create_model_and_transforms model = create_model( File "A:\Desktop\00 AI Images\stable-diffusion-webui\venv\lib\site-packages\open_clip\factory.py", line 165, in create_model load_checkpoint(model, checkpoint_path) File "A:\Desktop\00 AI Images\stable-diffusion-webui\venv\lib\site-packages\open_clip\factory.py", line 91, in load_checkpoint state_dict = load_state_dict(checkpoint_path) File "A:\Desktop\00 AI Images\stable-diffusion-webui\venv\lib\site-packages\open_clip\factory.py", line 80, in load_state_dict checkpoint = torch.load(checkpoint_path, map_location=map_location) File "A:\Desktop\00 AI Images\stable-diffusion-webui\modules\safe.py", line 102, in load return load_with_extra(filename, *args, *kwargs) File "A:\Desktop\00 AI Images\stable-diffusion-webui\modules\safe.py", line 147, in load_with_extra return unsafe_torch_load(filename, args, **kwargs) File "A:\Desktop\00 AI Images\stable-diffusion-webui\venv\lib\site-packages\torch\serialization.py", line 705, in load with _open_zipfile_reader(opened_file) as opened_zipfile: File "A:\Desktop\00 AI Images\stable-diffusion-webui\venv\lib\site-packages\torch\serialization.py", line 242, in init super(_open_zipfile_reader, self).init(torch._C.PyTorchFileReader(name_or_buffer)) RuntimeError: PytorchStreamReader failed reading zip archive: failed finding central directory

0xdevalias commented 1 year ago

diffusers==0.9.0 with Stable Diffusion 2 is live!

https://github.com/huggingface/diffusers/releases/tag/v0.9.0

Originally posted by @anton-l in https://github.com/huggingface/diffusers/issues/1388#issuecomment-1327731012

Daviljoe193 commented 1 year ago

I've noticed that with uservar's repo, sometimes on Google Colab's free tier (Via this notebook, can't remember where I found it) can load the models fine, but other times, it ^Cs, even after reboots. What's weird is that it doesn't always give me trouble running the v2 models, just sometimes. Switching models on the spot seems to ^C most of the time, even switching from the bigger model to the smaller one, which makes very little sense to me.

Echolink50 commented 1 year ago

Same here with uservar. I got it working then after a reboot it stop working with the errors like patrickmac

patrickmac110 commented 1 year ago

Same here with uservar. I got it working then after a reboot it stop working with the errors like patrickmac

So I should reboot to double break it... to fix it!

Daviljoe193 commented 1 year ago

Same here with uservar. I got it working then after a reboot it stop working with the errors like patrickmac

So I should reboot to double break it... to fix it!

I've had luck running it, and if it seems like it's taking longer than usual, stopping it, then rerunning it, without rebooting. Again, strange af.

Echolink50 commented 1 year ago

Same here with uservar. I got it working then after a reboot it stop working with the errors like patrickmac

So I should reboot to double break it... to fix it!

No. Probably just go on a coke binge for a couple days til Auto1111 rises from the ashes as a glorious Phoenix and saves us.

AUTOMATIC1111 / stable-diffusion-webui