Closed AugmentedRealityCat closed 1 year ago
To help those who are ready to take on: https://twitter.com/RiversHaveWings/status/1595596524431773697 https://github.com/crowsonkb/k-diffusion/commit/4314f9101a2f3bd7f11ba4290d2a7e2e64b4ceea As far as I understand, we only need to use this wrapper if we are working with 2.0 models
Semi-Related:
Trying to use the wrapper here but realized that the model loader is not even getting there, the model weights are still using the v1 512, 512 torch sizes and the new model has 4 dimensions
To help those who are ready to take on: https://twitter.com/RiversHaveWings/status/1595596524431773697 crowsonkb/k-diffusion@4314f91 As far as I understand, we only need to use this wrapper if we are working with 2.0 models
Anyone on linux (and likely mac) that just want to try it, a few things I found:
I highly recommend cloning v2 to a new folder for the moment if you just want to try it!
models/Stable-diffusion
AttributeError: 'FrozenOpenCLIPEmbedder' object has no attribute 'process_text'
but it seems to be working anyway, I'm not sure exactly what that's about. EDIT: This appears to be related to getting the token count for the GUI, but I don't think this affects generationgit clone https://github.com/MrCheeze/stable-diffusion-webui.git stable-diffusion-v2
cd stable-diffusion-v2
git checkout sd-2.0 # I tested commit 069591b06bbbdb21624d489f3723b5f19468888d specifically
After setting up a venv, installing the requirements.txt, and placing the model into models/Stable-diffusion
, I was able to launch with the following command:
STABLE_DIFFUSION_REPO=https://github.com/Stability-AI/stablediffusion STABLE_DIFFUSION_COMMIT_HASH=33910c386eaba78b7247ce84f313de0f2c314f61 python launch.py --config repositories/stable-diffusion/configs/stable-diffusion/v2-inference-v.yaml
I get an error about AttributeError: 'FrozenOpenCLIPEmbedder' object has no attribute 'process_text' but it seems to be working anyway, I'm not sure exactly what that's about.
This causes my instance to stop working. How did you get it to proceed?
Edit: Resolved. Remove VRAM constraints
I confirm it works over here as well ! I did not have to use the special launch command though (STABLE_DIFFUSION_REPO=https://github.com/Stability-AI/stablediffusion STABLE_DIFFUSION_COMMIT_HASH=33910c386eaba78b7247ce84f313de0f2c314f61 python launch.py --config repositories/stable-diffusion/configs/stable-diffusion/v2-inference-v.yaml
) I simply used the webui-user.bat
launcher and it worked the first time after installing all dependencies automatically.
I have got it working on Google Colab. As @Penagwin mentioned, it throws a few errors but still functions.
Note: Tick checkbox for SD1_5 rather than adding it in the Add models section
The way it processes the text seems to be broken AttributeError: 'FrozenOpenCLIPEmbedder' object has no attribute 'process_text'
. My generations with the new models look ugly as hell
@AugmentedRealityCat The command is only needed for Linux and MacOs, the .bat
should work for windows
@acheong08 Someone else should confirm this, but I believe this error is for getting the token count for displaying in the UI, which is why it's not actually required for generation. If this is right then I don't think it should affect the actual generated image. This is the line that calls the method that errors, and it's inside update_token_counter
https://github.com/AUTOMATIC1111/stable-diffusion-webui/blob/828438b4a190759807f9054932cae3a8b880ddf1/modules/ui.py#L443
The first several prompts I tried were very... odd. I found 768x768 resolution made a huge difference. I also found starting a prompt from scratch might be a good idea too, just to learn the new prompting language.
I don't know for certain that it's not broken, but I was able to get a few images that I liked.
I've had success with the new DPM++ SDE Karras
, as well as Euler A
. I am finding it a bit more difficult to get good images but I'm unsure if that's because of the v2 changes or if something is broken, or if my prompts are just bad.
Some that I liked:
masterpiece, detailed, dreaming of electric (penguins :4), scifi, concept art, (surreal), galaxy background, sharp,[fractals :1.8], [recursion :1.8] Negative prompt: blurry Steps: 5, Sampler: DPM++ SDE Karras, CFG scale: 8, Seed: 3034038969, Size: 768x768, Model hash: 2c02b20a, Eta: 0.06
masterpiece, extremely detailed, dreaming of (electric) (penguins :4), scifi, concept art, (surreal), moon, galaxy background, sharp,[fractals :1.8], [recursion :1.8] Negative prompt: blurry Steps: 5, Sampler: DPM++ SDE Karras, CFG scale: 9, Seed: 670988386, Size: 768x768, Model hash: 2c02b20a, Eta: 0.06
masterpiece, extremely detailed, dreaming of (electric) (penguins :2), scifi, digital concept art, (surreal), moon, galaxy background, supernova, dramatic, sharp,[fractals :1.4], [recursion :1.8] Negative prompt: blurry, painting, drawing Steps: 15, Sampler: DPM++ SDE Karras, CFG scale: 13.5, Seed: 4235446037, Size: 768x768, Model hash: 2c02b20a, Eta: 0.06
On it!
Originally posted by @TheLastBen in https://github.com/TheLastBen/fast-stable-diffusion/issues/599#issuecomment-1326063269
@Penagwin It seems it was bad prompting. Their new prompt system messed it up for me. Trying a few times gets me much better results
NSFW has been completely wrecked. It was bad on 1.5 but now it's almost impossible to get anything aesthetic. It feels like Midjourney.
They succeeded at their goal.
NO NSFW??
NO NSFW??
All attempts seem to make it black and white with severely deformed limbs . The samples that make it past their filters seem to be low quality images and abstract art
NO NSFW??
All attempts seem to make it black and white with severely deformed limbs . The samples that make it past their filters seem to be low quality images and abstract art
Try 768x768, since that's what the model's trained for. Doing what you're doing is like telling 1.5 to work at 256x256, it ain't good at resolutions lower than what it was meant for.
At 768x768
Giving up on NSFW
@acheong08 Prompt? Kinda looks like what I got using "woman spread naked, on the beach, fullbody, nude, top-down" in 1.5. Stock model was never too good at NSFW anyway, too many gross mangled people. It did get more normal looking results after the first attempt with this prompt, though I can't tell if posting NSFW here is against Github TOS, so I'll refrain from posting those.
posting NSFW here is against Github TOS.
It's not porn, it's not even erotica. and not even naturalistic content at all. this is bodyhorror
posting NSFW here is against Github TOS.
It's not porn, it's not even erotica. and not even naturalistic content at all. this is bodyhoror
I'll refrain from posting any more here. All my attempts with 2.0 has been horrific.
Prompt?
I just copy pasted 1.5 prompts that got me good results previously. I'll ask around in Discord. GitHub is not meant for such discussions
Do you want to hide your ugly
Took some fighting to get the 2.0 model to work within the free teir of Colab (Kept ^Cing on me), but after restarting, it had just enough ram free to run the GUI. Running the prompt again, with the same sampler and step count (Steps: 50, Sampler: DPM++ 2M Karras), but with the addition of some naughty bits (Boobs, breasts, vagina, please Github don't kill me), I did get the usual stock results, somewhere about on par with what 1.5 was capable of.
![grid-0001](https://user-images.githubusercontent.com/67191631/203749564-945e5222-34fb-4cd5-a212-b12d8b6eda3a.jpg)
To say the results are good would be a complete lie, but again, they are about what you'd expect from the stock 1.5 model. What's a bit weird is that there seems to be some strange alignment issues. When it isn't mangled, it's off center.
Kept ^Cing on me)
@Daviljoe193
How did you solve this? I'm also getting ^C with a paid plan...
Kept ^Cing on me)
How did you solve this? I'm also getting ^C with a paid plan...
Restart the session after running everything just up to the ^Cing cell, then re-run that cell again, changing nothing. It's stupid, but that's how it is.
Thanks it worked!
NSFW has been completely wrecked. It was bad on 1.5 but now it's almost impossible to get anything aesthetic.
According to the model card they HEAVILY filtered the training data before training the model (threshold of 0.1, where 1.0 is considered fully NSFW), so it's not just a filter tacked on at the end like last time.
Training Data The model developers used the following dataset for training the model:
- LAION-5B and subsets (details below). The training data is further filtered using LAION's NSFW detector, with a "p_unsafe" score of 0.1 (conservative). For more details, please refer to LAION-5B's NeurIPS 2022 paper and reviewer discussions on the topic.
We currently provide the following checkpoints:
- 512-base-ema.ckpt: 550k steps at resolution 256x256 on a subset of LAION-5B filtered for explicit pornographic material, using the LAION-NSFW classifier with punsafe=0.1 and an aesthetic score >= 4.5. 850k steps at resolution 512x512 on the same dataset with resolution >= 512x512.
That said, I would assume that that would also mean that anyone who gathered a sufficient training dataset could probably finetune/dreambooth the concept back into the model.
they are about what you'd expect from the stock 1.5 model.
I get much better results from 1.5 on average (for nsfw). No deformations with the correct negative prompts
That said, I would assume that that would also mean that anyone who gathered a sufficient training dataset could probably finetune/dreambooth the concept back into the model.
The dataset is already publicly available. The issue is computational power.
this GH issue is like a chat right now lol
this GH issue is like a chat right now lol
Discord for devs
Speaking of computational power, is distributed training of Stable Diffusion possible across botnets possible?
Notes:
- Only tested on the two txt2img models, not inpaint / depth2img / upscaling
- You will need to change your text embedding to use the penultimate layer too
- It spits out a bunch of warnings about vision_model, but that's fine
- I have no idea if this is right or not. It generates images, no guarantee beyond that. (Hence no PR - if you're patient, I'm sure the Diffusers team will do a better job than I have)
Originally posted by @hafriedlander in https://github.com/huggingface/diffusers/issues/1388#issuecomment-1326135768
Here's an example of accessing the penultimate text embedding layer https://github.com/hafriedlander/stable-diffusion-grpcserver/blob/b34bb27cf30940f6a6a41f4b77c5b77bea11fd76/sdgrpcserver/pipeline/text_embedding/basic_text_embedding.py#L33
Originally posted by @hafriedlander in https://github.com/huggingface/diffusers/issues/1388#issuecomment-1326166368
doesn't seem to work for me on the 768-v model using the v2 config for v
TypeError: EulerDiscreteScheduler.init() got an unexpected keyword argument 'prediction_type'
Originally posted by @devilismyfriend in https://github.com/huggingface/diffusers/issues/1388#issuecomment-1326220609
You need to use the absolute latest Diffusers and merge this PR (or use my branch which has it in it) https://github.com/huggingface/diffusers/pull/1386
Originally posted by @hafriedlander in https://github.com/huggingface/diffusers/issues/1388#issuecomment-1326243809
(My branch is at https://github.com/hafriedlander/diffusers/tree/stable_diffusion_2)
Originally posted by @hafriedlander in https://github.com/huggingface/diffusers/issues/1388#issuecomment-1326245339
Speaking of computational power, is distributed training of Stable Diffusion possible across botnets possible?
While I haven't personally done anything like that myself, a lot of the things like deepspeed, ColossalAI, etc are basically designed for doing distributed training, and there are other things like StableHorde, etc for more adhoc/distributed stuff.
So, tl;dr: almost certainly.
Reading through the thread half awake and making the following assumptions: 2.0 is possible in automatic1111? But 2.0 is censored into oblivion if you want to have a female form (not talking porn but your standard sexy character d&d art)? 2.0 prompt crafting is completely different or broken in automatic1111? NSFW stuff is impossible to add by the community in 2.0 unless they have a supercomputer?
Have I skimmed through the issues correctly?
Reading through the thread half awake and making the following assumptions: 2.0 is possible in automatic1111? But 2.0 is censored into oblivion if you want to have a female form (not talking porn but your standard sexy character d&d art)? 2.0 prompt crafting is completely different or broken in automatic1111? NSFW stuff is impossible to add by the community in 2.0 unless they have a supercomputer?
Have I skimmed through the issues correctly?
Yup, for the most part. I'm not 100% sure if model training resumed from one of the previous uncensored checkpoints, or if it was restarted from scratch, as my post from before DID have some naughty bits, despite those having been filtered out of the current dataset.
2.0 is possible in automatic1111?
Yup, should be.
But 2.0 is censored into oblivion if you want to have a female form (not talking porn but your standard sexy character d&d art)?
Sort of depends on how you define 'censored into oblivion' I suppose..
2.0 prompt crafting is completely different or broken in automatic1111?
I haven't played with this myself, but I did get that impression. SD v2.0 is trained using a completely different language model than v1.4/v1.5, so it would make sense to me that the exact same prompts as earlier aren't going to work the exact same way anymore.
NSFW stuff is impossible to add by the community in 2.0 unless they have a supercomputer?
I don't think that's true.
Have I skimmed through the issues correctly?
Seemingly more or less :)
I'm not 100% sure if model training resumed from one of the previous uncensored checkpoints, or if it was restarted from scratch
https://github.com/Stability-AI/stablediffusion/blob/main/modelcard.md
We currently provide the following checkpoints:
- 512-base-ema.ckpt: 550k steps at resolution 256x256 on a subset of LAION-5B filtered for explicit pornographic material, using the LAION-NSFW classifier with punsafe=0.1 and an aesthetic score >= 4.5. 850k steps at resolution 512x512 on the same dataset with resolution >= 512x512.
While I don't know this for certain, that very much reads to me as though it was trained from scratch. For previous model releases they have been pretty explicit in mentioning when it was resumed from a previous checkpoint. I believe given they're using a completely different text encoder that they probably wouldn't have been able to resume from a v1.4 model either.
Apologies to everybody waiting on support for v2. I am still extremely busy with my midterms and unable to work on this in any capacity.
Remarks derived from an initial glance:
My xformers code is most likely rendered obsolete by the new native support, though my wheels are still necessary. It seems like the current CLIP hijack - and any related features - will fail as is. We'd need to ship v1-inference.yaml and modify the default config to support both older v1 models (like the huge library of dreambooth models) and v2.
Aside from those, simply switching the repo to the StabilityAI one should allow loading. Actual, proper inference would likely require updating samplers at least.
testing in progress on the horde https://github.com/Sygil-Dev/nataili/tree/v2 try it out Stable Diffusion 2.0 on our UI's
https://tinybots.net/artbot https://aqualxx.github.io/stable-ui/ https://dbzer0.itch.io/lucid-creations
https://sigmoid.social/@stablehorde/109398715339480426
SD 2.0
- [x] Initial implementation ready for testing
- [ ] img2img
- [ ] inpainting
- [ ] k_diffusers support
Originally posted by @AlRlC in https://github.com/Sygil-Dev/nataili/issues/67#issuecomment-1326385645
- https://github.com/TheLastBen/fast-stable-diffusion/commit/11fd38bfbd2f1ed42449b37ba88ba324ff42ba43
Create pathsV2.py
- https://github.com/TheLastBen/fast-stable-diffusion/commit/fe445d986f08a1134f26f5efcd1c0829f34bc481
Support for SD V.2
- https://github.com/TheLastBen/fast-stable-diffusion/commit/da9b38010c2edc8fcccf2b0b70f321af30c0ecb8
fix
- https://github.com/TheLastBen/fast-stable-diffusion/commit/6c84728c72bd9735b0a5be4c62a292554c3b41d1
fix
- https://github.com/TheLastBen/fast-stable-diffusion/commit/04ba92b1931ab6aa0269a0516640f8874b004885
fix
- https://github.com/TheLastBen/fast-stable-diffusion/commit/ebea13401da873b3420fdf6f0fa02df567534a55
Create sd_hijackV2.py
- https://github.com/TheLastBen/fast-stable-diffusion/commit/88496f5199c82e9c5ee2ae40bc980140d8cd4ce5
Create sd_samplersV2.py
- https://github.com/TheLastBen/fast-stable-diffusion/commit/f324b3d85473d308ebeefb03de58ae6eb9070f42
fix V2
Originally posted by @0xdevalias in https://github.com/TheLastBen/fast-stable-diffusion/issues/599#issuecomment-1326446674
Should work now, make sure you check the box "redownload original model" when choosing V2
Requires more than 12GB of RAM for now, so free colab probably won't suffice.
Originally posted by @TheLastBen in https://github.com/TheLastBen/fast-stable-diffusion/issues/599#issuecomment-1326461962
Managed to get it working on free google colab with my fork of the automatic1111 repo which is compatible with both base 512 and V 768 (if you enable v-prediction checkbox), and also with old models if you don't specify a --config parameter (though ddim and plms sampling seem to be broken with the new Stability-AI/stablediffusion repo)
https://colab.research.google.com/drive/1ayH6PUri-vvTXhaoL3NEZr_iVvv2qosR
Hi, is it possible to just use the new "depth2img" feature in old models? This is the only improvement I'm interested in.
Hi, is it possible to just use the new "depth2img" feature in old models? This is the only improvement I'm interested in. @RaouleMenard
There is a model in v2 that can accept depth information generated by another model, but the v1 model does not have such a feature, so it seems difficult. It would be possible to generate a mask from depth information, but it would be inherently different from v2's.
2.0 model is not good at all, rubbish work!
while that seems to be the consensus given the removal of art styles...it would still be nice to use it in automatic1111 so we can make our own minds up.... all about choice dear fellow
A picture of a cat with little (orange:1.5), black, and white fur
Negative prompt: blurry
Steps: 14, Sampler: DPM++ stochastic, CFG scale: 7, Seed: 2157866423, Size: 1024x768, Model hash: 2c02b20a
fixed attention and emphasis part not sure how to implement CLIP stop layers feature...
A picture of a cat with little (orange:1.5), black, and white fur Negative prompt: blurry Steps: 14, Sampler: DPM++ stochastic, CFG scale: 7, Seed: 2157866423, Size: 1024x768, Model hash: 2c02b20a
code fixed attention and emphasis part not sure how to implement CLIP stop layers feature...
thats very accurate! hmmm duplicating automatic1111 now will update one copy in a bit
The new Stability AI GitHub repo appears to be located here now: https://github.com/Stability-AI/stablediffusion
A picture of a cat with little (orange:1.5), black, and white fur Negative prompt: blurry Steps: 14, Sampler: DPM++ stochastic, CFG scale: 7, Seed: 2157866423, Size: 1024x768, Model hash: 2c02b20a
code fixed attention and emphasis part not sure how to implement CLIP stop layers feature...
How did you get that to work in automatic1111 ?
Loading weights [2c02b20a] from D:\AIArt\NewVersion\SD\models\Stable-diffusion\768-v-ema.ckpt
Global Step: 140000
Traceback (most recent call last):
File "D:\AIArt\NewVersion\SD\venv\lib\site-packages\gradio\routes.py", line 284, in run_predict
output = await app.blocks.process_api(
File "D:\AIArt\NewVersion\SD\venv\lib\site-packages\gradio\blocks.py", line 982, in process_api
result = await self.call_function(fn_index, inputs, iterator)
File "D:\AIArt\NewVersion\SD\venv\lib\site-packages\gradio\blocks.py", line 824, in call_function
prediction = await anyio.to_thread.run_sync(
File "D:\AIArt\NewVersion\SD\venv\lib\site-packages\anyio\to_thread.py", line 31, in run_sync
return await get_asynclib().run_sync_in_worker_thread(
File "D:\AIArt\NewVersion\SD\venv\lib\site-packages\anyio\_backends\_asyncio.py", line 937, in run_sync_in_worker_thread
return await future
File "D:\AIArt\NewVersion\SD\venv\lib\site-packages\anyio\_backends\_asyncio.py", line 867, in run
result = context.run(func, *args)
File "D:\AIArt\NewVersion\SD\modules\ui.py", line 1664, in <lambda>
fn=lambda value, k=k: run_settings_single(value, key=k),
File "D:\AIArt\NewVersion\SD\modules\ui.py", line 1505, in run_settings_single
if not opts.set(key, value):
File "D:\AIArt\NewVersion\SD\modules\shared.py", line 454, in set
self.data_labels[key].onchange()
File "D:\AIArt\NewVersion\SD\webui.py", line 44, in f
res = func(*args, **kwargs)
File "D:\AIArt\NewVersion\SD\webui.py", line 86, in <lambda>
shared.opts.onchange("sd_model_checkpoint", wrap_queued_call(lambda: modules.sd_models.reload_model_weights()))
File "D:\AIArt\NewVersion\SD\modules\sd_models.py", line 289, in reload_model_weights
load_model_weights(sd_model, checkpoint_info)
File "D:\AIArt\NewVersion\SD\modules\sd_models.py", line 182, in load_model_weights
model.load_state_dict(sd, strict=False)
File "D:\AIArt\NewVersion\SD\venv\lib\site-packages\torch\nn\modules\module.py", line 1604, in load_state_dict
raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for LatentDiffusion:
size mismatch for model.diffusion_model.input_blocks.1.1.proj_in.weight: copying a param with shape torch.Size([320, 320]) from checkpoint, the shape in current model is torch.Size([320, 320, 1, 1]).
size mismatch for model.diffusion_model.input_blocks.1.1.transformer_blocks.0.attn2.to_k.weight: copying a param with shape torch.Size([320, 1024]) from checkpoint, the shape in current model is torch.Size([320, 768]).
size mismatch for model.diffusion_model.input_blocks.1.1.transformer_blocks.0.attn2.to_v.weight: copying a param with shape torch.Size([320, 1024]) from checkpoint, the shape in current model is torch.Size([320, 768]).
size mismatch for model.diffusion_model.input_blocks.1.1.proj_out.weight: copying a param with shape torch.Size([320, 320]) from checkpoint, the shape in current model is torch.Size([320, 320, 1, 1]).
size mismatch for model.diffusion_model.input_blocks.2.1.proj_in.weight: copying a param with shape torch.Size([320, 320]) from checkpoint, the shape in current model is torch.Size([320, 320, 1, 1]).
size mismatch for model.diffusion_model.input_blocks.2.1.transformer_blocks.0.attn2.to_k.weight: copying a param with shape torch.Size([320, 1024]) from checkpoint, the shape in current model is torch.Size([320, 768]).
size mismatch for model.diffusion_model.input_blocks.2.1.transformer_blocks.0.attn2.to_v.weight: copying a param with shape torch.Size([320, 1024]) from checkpoint, the shape in current model is torch.Size([320, 768]).
size mismatch for model.diffusion_model.input_blocks.2.1.proj_out.weight: copying a param with shape torch.Size([320, 320]) from checkpoint, the shape in current model is torch.Size([320, 320, 1, 1]).
size mismatch for model.diffusion_model.input_blocks.4.1.proj_in.weight: copying a param with shape torch.Size([640, 640]) from checkpoint, the shape in current model is torch.Size([640, 640, 1, 1]).
size mismatch for model.diffusion_model.input_blocks.4.1.transformer_blocks.0.attn2.to_k.weight: copying a param with shape torch.Size([640, 1024]) from checkpoint, the shape in current model is torch.Size([640, 768]).
size mismatch for model.diffusion_model.input_blocks.4.1.transformer_blocks.0.attn2.to_v.weight: copying a param with shape torch.Size([640, 1024]) from checkpoint, the shape in current model is torch.Size([640, 768]).
size mismatch for model.diffusion_model.input_blocks.4.1.proj_out.weight: copying a param with shape torch.Size([640, 640]) from checkpoint, the shape in current model is torch.Size([640, 640, 1, 1]).
size mismatch for model.diffusion_model.input_blocks.5.1.proj_in.weight: copying a param with shape torch.Size([640, 640]) from checkpoint, the shape in current model is torch.Size([640, 640, 1, 1]).
size mismatch for model.diffusion_model.input_blocks.5.1.transformer_blocks.0.attn2.to_k.weight: copying a param with shape torch.Size([640, 1024]) from checkpoint, the shape in current model is torch.Size([640, 768]).
size mismatch for model.diffusion_model.input_blocks.5.1.transformer_blocks.0.attn2.to_v.weight: copying a param with shape torch.Size([640, 1024]) from checkpoint, the shape in current model is torch.Size([640, 768]).
size mismatch for model.diffusion_model.input_blocks.5.1.proj_out.weight: copying a param with shape torch.Size([640, 640]) from checkpoint, the shape in current model is torch.Size([640, 640, 1, 1]).
size mismatch for model.diffusion_model.input_blocks.7.1.proj_in.weight: copying a param with shape torch.Size([1280, 1280]) from checkpoint, the shape in current model is torch.Size([1280, 1280, 1, 1]).
size mismatch for model.diffusion_model.input_blocks.7.1.transformer_blocks.0.attn2.to_k.weight: copying a param with shape torch.Size([1280, 1024]) from checkpoint, the shape in current model is torch.Size([1280, 768]).
size mismatch for model.diffusion_model.input_blocks.7.1.transformer_blocks.0.attn2.to_v.weight: copying a param with shape torch.Size([1280, 1024]) from checkpoint, the shape in current model is torch.Size([1280, 768]).
size mismatch for model.diffusion_model.input_blocks.7.1.proj_out.weight: copying a param with shape torch.Size([1280, 1280]) from checkpoint, the shape in current model is torch.Size([1280, 1280, 1, 1]).
size mismatch for model.diffusion_model.input_blocks.8.1.proj_in.weight: copying a param with shape torch.Size([1280, 1280]) from checkpoint, the shape in current model is torch.Size([1280, 1280, 1, 1]).
size mismatch for model.diffusion_model.input_blocks.8.1.transformer_blocks.0.attn2.to_k.weight: copying a param with shape torch.Size([1280, 1024]) from checkpoint, the shape in current model is torch.Size([1280, 768]).
size mismatch for model.diffusion_model.input_blocks.8.1.transformer_blocks.0.attn2.to_v.weight: copying a param with shape torch.Size([1280, 1024]) from checkpoint, the shape in current model is torch.Size([1280, 768]).
size mismatch for model.diffusion_model.input_blocks.8.1.proj_out.weight: copying a param with shape torch.Size([1280, 1280]) from checkpoint, the shape in current model is torch.Size([1280, 1280, 1, 1]).
size mismatch for model.diffusion_model.middle_block.1.proj_in.weight: copying a param with shape torch.Size([1280, 1280]) from checkpoint, the shape in current model is torch.Size([1280, 1280, 1, 1]).
size mismatch for model.diffusion_model.middle_block.1.transformer_blocks.0.attn2.to_k.weight: copying a param with shape torch.Size([1280, 1024]) from checkpoint, the shape in current model is torch.Size([1280, 768]).
size mismatch for model.diffusion_model.middle_block.1.transformer_blocks.0.attn2.to_v.weight: copying a param with shape torch.Size([1280, 1024]) from checkpoint, the shape in current model is torch.Size([1280, 768]).
size mismatch for model.diffusion_model.middle_block.1.proj_out.weight: copying a param with shape torch.Size([1280, 1280]) from checkpoint, the shape in current model is torch.Size([1280, 1280, 1, 1]).
size mismatch for model.diffusion_model.output_blocks.3.1.proj_in.weight: copying a param with shape torch.Size([1280, 1280]) from checkpoint, the shape in current model is torch.Size([1280, 1280, 1, 1]).
size mismatch for model.diffusion_model.output_blocks.3.1.transformer_blocks.0.attn2.to_k.weight: copying a param with shape torch.Size([1280, 1024]) from checkpoint, the shape in current model is torch.Size([1280, 768]).
size mismatch for model.diffusion_model.output_blocks.3.1.transformer_blocks.0.attn2.to_v.weight: copying a param with shape torch.Size([1280, 1024]) from checkpoint, the shape in current model is torch.Size([1280, 768]).
size mismatch for model.diffusion_model.output_blocks.3.1.proj_out.weight: copying a param with shape torch.Size([1280, 1280]) from checkpoint, the shape in current model is torch.Size([1280, 1280, 1, 1]).
size mismatch for model.diffusion_model.output_blocks.4.1.proj_in.weight: copying a param with shape torch.Size([1280, 1280]) from checkpoint, the shape in current model is torch.Size([1280, 1280, 1, 1]).
size mismatch for model.diffusion_model.output_blocks.4.1.transformer_blocks.0.attn2.to_k.weight: copying a param with shape torch.Size([1280, 1024]) from checkpoint, the shape in current model is torch.Size([1280, 768]).
size mismatch for model.diffusion_model.output_blocks.4.1.transformer_blocks.0.attn2.to_v.weight: copying a param with shape torch.Size([1280, 1024]) from checkpoint, the shape in current model is torch.Size([1280, 768]).
size mismatch for model.diffusion_model.output_blocks.4.1.proj_out.weight: copying a param with shape torch.Size([1280, 1280]) from checkpoint, the shape in current model is torch.Size([1280, 1280, 1, 1]).
size mismatch for model.diffusion_model.output_blocks.5.1.proj_in.weight: copying a param with shape torch.Size([1280, 1280]) from checkpoint, the shape in current model is torch.Size([1280, 1280, 1, 1]).
size mismatch for model.diffusion_model.output_blocks.5.1.transformer_blocks.0.attn2.to_k.weight: copying a param with shape torch.Size([1280, 1024]) from checkpoint, the shape in current model is torch.Size([1280, 768]).
size mismatch for model.diffusion_model.output_blocks.5.1.transformer_blocks.0.attn2.to_v.weight: copying a param with shape torch.Size([1280, 1024]) from checkpoint, the shape in current model is torch.Size([1280, 768]).
size mismatch for model.diffusion_model.output_blocks.5.1.proj_out.weight: copying a param with shape torch.Size([1280, 1280]) from checkpoint, the shape in current model is torch.Size([1280, 1280, 1, 1]).
size mismatch for model.diffusion_model.output_blocks.6.1.proj_in.weight: copying a param with shape torch.Size([640, 640]) from checkpoint, the shape in current model is torch.Size([640, 640, 1, 1]).
size mismatch for model.diffusion_model.output_blocks.6.1.transformer_blocks.0.attn2.to_k.weight: copying a param with shape torch.Size([640, 1024]) from checkpoint, the shape in current model is torch.Size([640, 768]).
size mismatch for model.diffusion_model.output_blocks.6.1.transformer_blocks.0.attn2.to_v.weight: copying a param with shape torch.Size([640, 1024]) from checkpoint, the shape in current model is torch.Size([640, 768]).
size mismatch for model.diffusion_model.output_blocks.6.1.proj_out.weight: copying a param with shape torch.Size([640, 640]) from checkpoint, the shape in current model is torch.Size([640, 640, 1, 1]).
size mismatch for model.diffusion_model.output_blocks.7.1.proj_in.weight: copying a param with shape torch.Size([640, 640]) from checkpoint, the shape in current model is torch.Size([640, 640, 1, 1]).
size mismatch for model.diffusion_model.output_blocks.7.1.transformer_blocks.0.attn2.to_k.weight: copying a param with shape torch.Size([640, 1024]) from checkpoint, the shape in current model is torch.Size([640, 768]).
size mismatch for model.diffusion_model.output_blocks.7.1.transformer_blocks.0.attn2.to_v.weight: copying a param with shape torch.Size([640, 1024]) from checkpoint, the shape in current model is torch.Size([640, 768]).
size mismatch for model.diffusion_model.output_blocks.7.1.proj_out.weight: copying a param with shape torch.Size([640, 640]) from checkpoint, the shape in current model is torch.Size([640, 640, 1, 1]).
size mismatch for model.diffusion_model.output_blocks.8.1.proj_in.weight: copying a param with shape torch.Size([640, 640]) from checkpoint, the shape in current model is torch.Size([640, 640, 1, 1]).
size mismatch for model.diffusion_model.output_blocks.8.1.transformer_blocks.0.attn2.to_k.weight: copying a param with shape torch.Size([640, 1024]) from checkpoint, the shape in current model is torch.Size([640, 768]).
size mismatch for model.diffusion_model.output_blocks.8.1.transformer_blocks.0.attn2.to_v.weight: copying a param with shape torch.Size([640, 1024]) from checkpoint, the shape in current model is torch.Size([640, 768]).
size mismatch for model.diffusion_model.output_blocks.8.1.proj_out.weight: copying a param with shape torch.Size([640, 640]) from checkpoint, the shape in current model is torch.Size([640, 640, 1, 1]).
size mismatch for model.diffusion_model.output_blocks.9.1.proj_in.weight: copying a param with shape torch.Size([320, 320]) from checkpoint, the shape in current model is torch.Size([320, 320, 1, 1]).
size mismatch for model.diffusion_model.output_blocks.9.1.transformer_blocks.0.attn2.to_k.weight: copying a param with shape torch.Size([320, 1024]) from checkpoint, the shape in current model is torch.Size([320, 768]).
size mismatch for model.diffusion_model.output_blocks.9.1.transformer_blocks.0.attn2.to_v.weight: copying a param with shape torch.Size([320, 1024]) from checkpoint, the shape in current model is torch.Size([320, 768]).
size mismatch for model.diffusion_model.output_blocks.9.1.proj_out.weight: copying a param with shape torch.Size([320, 320]) from checkpoint, the shape in current model is torch.Size([320, 320, 1, 1]).
size mismatch for model.diffusion_model.output_blocks.10.1.proj_in.weight: copying a param with shape torch.Size([320, 320]) from checkpoint, the shape in current model is torch.Size([320, 320, 1, 1]).
size mismatch for model.diffusion_model.output_blocks.10.1.transformer_blocks.0.attn2.to_k.weight: copying a param with shape torch.Size([320, 1024]) from checkpoint, the shape in current model is torch.Size([320, 768]).
size mismatch for model.diffusion_model.output_blocks.10.1.transformer_blocks.0.attn2.to_v.weight: copying a param with shape torch.Size([320, 1024]) from checkpoint, the shape in current model is torch.Size([320, 768]).
size mismatch for model.diffusion_model.output_blocks.10.1.proj_out.weight: copying a param with shape torch.Size([320, 320]) from checkpoint, the shape in current model is torch.Size([320, 320, 1, 1]).
size mismatch for model.diffusion_model.output_blocks.11.1.proj_in.weight: copying a param with shape torch.Size([320, 320]) from checkpoint, the shape in current model is torch.Size([320, 320, 1, 1]).
size mismatch for model.diffusion_model.output_blocks.11.1.transformer_blocks.0.attn2.to_k.weight: copying a param with shape torch.Size([320, 1024]) from checkpoint, the shape in current model is torch.Size([320, 768]).
size mismatch for model.diffusion_model.output_blocks.11.1.transformer_blocks.0.attn2.to_v.weight: copying a param with shape torch.Size([320, 1024]) from checkpoint, the shape in current model is torch.Size([320, 768]).
size mismatch for model.diffusion_model.output_blocks.11.1.proj_out.weight: copying a param with shape torch.Size([320, 320]) from checkpoint, the shape in current model is torch.Size([320, 320, 1, 1]).
Is there an existing issue for this?
What would your feature do ?
Support the new 768x768 model 2.0 from Stability-AI and all the other new models that just got released.
Proposed workflow
768-v-ema.ckpt
from the listLinks
https://huggingface.co/stabilityai/stable-diffusion-2 https://huggingface.co/stabilityai/stable-diffusion-2-base https://huggingface.co/stabilityai/stable-diffusion-2-depth https://huggingface.co/stabilityai/stable-diffusion-2-inpainting https://huggingface.co/stabilityai/stable-diffusion-x4-upscaler/tree/main
768 model download link on HuggingFace: https://huggingface.co/stabilityai/stable-diffusion-2/blob/main/768-v-ema.ckpt 512 base model download link: https://huggingface.co/stabilityai/stable-diffusion-2-base/blob/main/512-base-ema.ckpt 512 depth model download link: https://huggingface.co/stabilityai/stable-diffusion-2-depth/blob/main/512-depth-ema.ckpt 512 inpainting model download link: https://huggingface.co/stabilityai/stable-diffusion-2-inpainting/blob/main/512-inpainting-ema.ckpt new x4 upscaler download link: https://huggingface.co/stabilityai/stable-diffusion-x4-upscaler/blob/main/x4-upscaler-ema.ckpt
Additional information
Here is the error message you get when trying to load the 768x768 2.0 model with the current release: