Add support for Stable Diffusion 2.0 models

0xdevalias commented 1 year ago

What would your feature do ?

Support the new 768x768 model 2.0 from Stability-AI and all the other new models that just got released.

Links

https://github.com/hafriedlander/diffusers/blob/stable_diffusion_2/scripts/convert_original_stable_diffusion_to_diffusers.py

Notes:

Only tested on the two txt2img models, not inpaint / depth2img / upscaling

You will need to change your text embedding to use the penultimate layer too

It spits out a bunch of warnings about vision_model, but that's fine

I have no idea if this is right or not. It generates images, no guarantee beyond that. (Hence no PR - if you're patient, I'm sure the Diffusers team will do a better job than I have)

Originally posted by @hafriedlander in https://github.com/huggingface/diffusers/issues/1388#issuecomment-1326135768

Here's an example of accessing the penultimate text embedding layer https://github.com/hafriedlander/stable-diffusion-grpcserver/blob/b34bb27cf30940f6a6a41f4b77c5b77bea11fd76/sdgrpcserver/pipeline/text_embedding/basic_text_embedding.py#L33

Originally posted by @hafriedlander in https://github.com/huggingface/diffusers/issues/1388#issuecomment-1326166368

doesn't seem to work for me on the 768-v model using the v2 config for v

TypeError: EulerDiscreteScheduler.init() got an unexpected keyword argument 'prediction_type'

Originally posted by @devilismyfriend in https://github.com/huggingface/diffusers/issues/1388#issuecomment-1326220609

You need to use the absolute latest Diffusers and merge this PR (or use my branch which has it in it) https://github.com/huggingface/diffusers/pull/1386

Originally posted by @hafriedlander in https://github.com/huggingface/diffusers/issues/1388#issuecomment-1326243809

(My branch is at https://github.com/hafriedlander/diffusers/tree/stable_diffusion_2)

Originally posted by @hafriedlander in https://github.com/huggingface/diffusers/issues/1388#issuecomment-1326245339

0xdevalias commented 1 year ago

testing in progress on the horde https://github.com/Sygil-Dev/nataili/tree/v2 try it out Stable Diffusion 2.0 on our UI's

https://tinybots.net/artbot https://aqualxx.github.io/stable-ui/ https://dbzer0.itch.io/lucid-creations

https://sigmoid.social/@stablehorde/109398715339480426

SD 2.0

[x] Initial implementation ready for testing

[ ] img2img

[ ] inpainting

[ ] k_diffusers support

Originally posted by @AlRlC in https://github.com/Sygil-Dev/nataili/issues/67#issuecomment-1326385645

0xdevalias commented 1 year ago

https://github.com/TheLastBen/fast-stable-diffusion/commit/11fd38bfbd2f1ed42449b37ba88ba324ff42ba43

Create pathsV2.py

https://github.com/TheLastBen/fast-stable-diffusion/commit/fe445d986f08a1134f26f5efcd1c0829f34bc481

Support for SD V.2

https://github.com/TheLastBen/fast-stable-diffusion/commit/da9b38010c2edc8fcccf2b0b70f321af30c0ecb8

fix

https://github.com/TheLastBen/fast-stable-diffusion/commit/6c84728c72bd9735b0a5be4c62a292554c3b41d1

fix

https://github.com/TheLastBen/fast-stable-diffusion/commit/04ba92b1931ab6aa0269a0516640f8874b004885

fix

https://github.com/TheLastBen/fast-stable-diffusion/commit/ebea13401da873b3420fdf6f0fa02df567534a55

Create sd_hijackV2.py

https://github.com/TheLastBen/fast-stable-diffusion/commit/88496f5199c82e9c5ee2ae40bc980140d8cd4ce5

Create sd_samplersV2.py

https://github.com/TheLastBen/fast-stable-diffusion/commit/f324b3d85473d308ebeefb03de58ae6eb9070f42

fix V2

Originally posted by @0xdevalias in https://github.com/TheLastBen/fast-stable-diffusion/issues/599#issuecomment-1326446674

0xdevalias commented 1 year ago

Should work now, make sure you check the box "redownload original model" when choosing V2

https://colab.research.google.com/github/TheLastBen/fast-stable-diffusion/blob/main/fast_stable_diffusion_AUTOMATIC1111.ipynb

Requires more than 12GB of RAM for now, so free colab probably won't suffice.

Originally posted by @TheLastBen in https://github.com/TheLastBen/fast-stable-diffusion/issues/599#issuecomment-1326461962

0xdevalias commented 1 year ago

From @pcuenca on the HF discord:
We are busy preparing a new release of diffusers to fully support Stable Diffusion 2. We are still ironing things out, but the basics already work from the main branch in github. Here's how to do it:

Install diffusers from github alongside its dependencies:
pip install --upgrade git+https://github.com/huggingface/diffusers.git transformers accelerate scipy
Use the code in this script to run your predictions:
from diffusers import DiffusionPipeline, EulerDiscreteScheduler
import torch

repo_id = "stabilityai/stable-diffusion-2"
device = "cuda"

scheduler = EulerDiscreteScheduler.from_pretrained(repo_id, subfolder="scheduler", prediction_type="v_prediction")
pipe = DiffusionPipeline.from_pretrained(repo_id, torch_dtype=torch.float16, revision="fp16", scheduler=scheduler)
pipe = pipe.to(device)

prompt = "High quality photo of an astronaut riding a horse in space"
image = pipe(prompt, width=768, height=768, guidance_scale=9).images[0]
image.save("astronaut.png")
Originally posted by @vvvm23 in https://github.com/huggingface/diffusers/issues/1392#issuecomment-1326747275

0xdevalias commented 1 year ago

how sure are you that your conversion is correct? I'm trying to diagnose a difference I get between your 768 weights and my conversion script. There's a big difference, and in general I much prefer the results from my conversion. It seems specific to the unet - if I replace my unet with yours I get the same results.

Originally posted by @hafriedlander in https://github.com/huggingface/diffusers/issues/1388#issuecomment-1327018829

OK, differential diagnostic done, it's the Tokenizer. How did you create the Tokenizer at https://huggingface.co/stabilityai/stable-diffusion-2/tree/main/tokenizer? I just built a Tokenizer using AutoTokenizer.from_pretrained("laion/CLIP-ViT-H-14-laion2B-s32B-b79K") - it seems to give much better results.

Originally posted by @hafriedlander in https://github.com/huggingface/diffusers/issues/1388#issuecomment-1327031107

I've put "my" version of the Tokenizer at https://huggingface.co/halffried/sd2-laion-clipH14-tokenizer/tree/main. You can just replace the tokenizer in any pipeline to test it if you're interested.

Originally posted by @hafriedlander in https://github.com/huggingface/diffusers/issues/1388#issuecomment-1327077503

leondelee commented 1 year ago

when will Dreambooth support sd2

0xdevalias commented 1 year ago

diffusers==0.9.0 with Stable Diffusion 2 is live!

https://github.com/huggingface/diffusers/releases/tag/v0.9.0

Originally posted by @anton-l in https://github.com/huggingface/diffusers/issues/1388#issuecomment-1327731012

0xdevalias commented 1 year ago

I've almost finished a proper implementation of Stable Diffusion 2.0 in Automatic1111, so that it runs locally and automatically updates everything and works on 4GB lowvram. It supports both 1.5 and 2.0 models and you can switch between models from the menu like normal.

So far the 512x512 base model, 512x512 inpainting model, and the 768x768 v-prediction model work properly. The upscaler model and depth models load correctly but don't work to generate images yet. It gives an error trying to load old Textual Inversion embeddings with the new models, but that can't be helped. And the PLMS sampling method isn't working. I'll push it soon.

Originally posted by @CarlKenner in https://github.com/AUTOMATIC1111/stable-diffusion-webui/issues/5011#issuecomment-1327367090

0xdevalias commented 1 year ago

when will Dreambooth support sd2

While it's not dreambooth, this repo seems to have support for finetuning SDv2:

https://github.com/smirkingface/stable-diffusion
- https://github.com/smirkingface/stable-diffusion#news
- Added support for inference and finetuning with the SD 2.0 base model (inpainting is still unsupported).
  - https://github.com/smirkingface/stable-diffusion/blob/main/docs/sd2.0.md

0xdevalias commented 1 year ago

And looking at the huggingface/diffusers repo, there are a few issues that seem to imply people may be getting dreambooth things working with that (or at least trying to), eg.:

https://github.com/huggingface/diffusers/issues/1429

leonnn1 commented 1 year ago

since this repo is dead unfortunally, take a look at this: https://colab.research.google.com/github/TheLastBen/fast-stable-diffusion/blob/main/fast-DreamBooth.ipynb?fbclid=IwAR38oVRcHjIKsOTGoBKoyY7a9XEVM9vvEtZGrgREFw36oxyeYrhMWYfIjhM#scrollTo=O3KHGKqyeJp9

goffi-contrib commented 1 year ago

For the depth model, the https://github.com/epitaque/dreambooth_depth2img repos does it by generating depth map for every input images, it would be great to see this integrated in this repos.

JoePenna / Dreambooth-Stable-Diffusion

Add support for Stable Diffusion 2.0 models #112

What would your feature do ?

Links

See Also

SD 2.0