Closed AugmentedRealityCat closed 1 year ago
It gives an error trying to load old Textual Inversion embeddings with the new models, but that can't be helped.
I'm 99% sure old embeddings wont work with 2.0 anyway, because of the retrained text encoder. You'll probably get nonsense.
Can I presume the models are so fundamentally different that gen 1 and gen 2 can't be merged?
Edit:Also it wouldn't be possible to just merge the weights from gen 1 and gen 2 and keep the architecture, text encoder, etc from gen 2? Would it give nonsense, if possible?
Thanks for the link. How do we run it? I tried running user2's branch, but I couldn't get it working. Appreciate your help.
It's meant to be run locally. I'm assuming you have installed the misleadingly named "SUPER Stable diffusion 2.0" (actually 1.4) according to this video's instructions from months ago: https://youtu.be/vg8-NSbaWZI or something similar that uses Automatic1111 on your own computer.
You'll need to download either 768-v-ema.ckpt (for generating images natively in 768x768) or 512-base-ema.ckpt (for generating images in 512x512) or 512-inpainting-ema.ckpt (for inpainting) and put it in your A:\AI\Super SD 2.0\stable-diffusion-webui\models\Stable-diffusion
folder (or whatever you called it). Don't rename the start of the file, it needs the 512-
or 768-v-
at the start, and needs the word inpainting
somewhere in the middle if it's an inpainting model. Or you can call it whatever you want, as long as you put the appropriate .yaml
file next to it with the same name.
Go into your A:\AI\Super SD 2.0\stable-diffusion-webui\
folder and use git to create a new branch (so your master branch will work when the real Automatic1111 version comes out) and switch to that branch. Then do a git pull on my https://github.com/CarlKenner/stable-diffusion-webui.git
remote, with the branch set to dev2-carl
Then run webui-user.bat
and wait for ages while it downloads and installs the modules it needs. And copy the IP address it gives you into your web browser. Then just use it like normal, and load whatever models you want from any Stable Diffusion version.
Of course the real version will come out soon, so you'll need to switch back to your master branch at some point in the future and do a git pull.
Can I presume the models are so fundamentally different that gen 1 and gen 2 can't be merged?
Edit:Also it wouldn't be possible to just merge the weights from gen 1 and gen 2 and keep the architecture, text encoder, etc from gen 2? Would it give nonsense, if possible?
I don't think that's possible because the weights are based on the inputs from the text encoder, which would be totally different. Words would end up having random meanings.
@aniketgore
The issue for you might be due to the fact that the commit hash wasn't changed for launch.py
, only 'webui-user.bat'. I forked their repo and added the minor changes https://github.com/acheong08/stable-diffusion-webui/tree/SDV2.0
There's a small detail in the implementation relating to the open_clip tokenizer that we were doing differently which should now be fixed with this commit: https://github.com/uservar/stable-diffusion-webui/commit/49df7c9aca39bccec623dd54ae33fb6963e41464
sot_token = _tokenizer.encoder["<start_of_text>"]
eot_token = _tokenizer.encoder["<end_of_text>"]
all_tokens = [[sot_token] + _tokenizer.encode(text) + [eot_token] for text in texts]
result = torch.zeros(len(all_tokens), context_length, dtype=torch.long)
for i, tokens in enumerate(all_tokens):
if len(tokens) > context_length:
tokens = tokens[:context_length] # Truncate
tokens[-1] = eot_token
result[i, :len(tokens)] = torch.tensor(tokens)
The tokens should be something like this: [start_of_text, token1, token2, token3, end_of_text, 0, 0, 0, ..., 0]
Instead of like this: [start_of_text, token1, token2, token3, end_of_text, end_of_text, end_of_text, end_of_text, ..., end_of_text]
There's probably still a few small bugs remaining but things are looking good in terms of supporting the 2.0 models so far.
I feel like we need a better way of determining model architecture that doesn't rely on the filename.
Don't rename the start of the file, it needs the
512-
or768-v-
at the start, and needs the wordinpainting
somewhere in the middle if it's an inpainting model. Or you can call it whatever you want, as long as you put the appropriate.yaml
file next to it with the same name.
Doesn't work when in a subfolder...
@ carl/uservar... v-model detection is broken right now, can't use .get() in the way it's currently used. The change below fixes it.
@CarlKenner Your branch just gives me mostly noise using 768 weights (& yes I am generating at 768x768) the 512 weights work fine though. uservar's repo is working fine with both
@CarlKenner Your branch just gives me mostly noise using 768 weights (& yes I am generating at 768x768) the 512 weights work fine though. uservar's repo is working fine with both
Same here. Its noise sometimes and a terrible image other times.
@CarlKenner Your branch just gives me mostly noise using 768 weights (& yes I am generating at 768x768) the 512 weights work fine though. uservar's repo is working fine with both
Same here. Its noise sometimes and a terrible image other times.
if you're getting totally unusable images, it's because of the buggy v-objective model detection I mentioned above, which is fixed on uservar/dev2 now.
@CarlKenner Your branch just gives me mostly noise using 768 weights (& yes I am generating at 768x768) the 512 weights work fine though. uservar's repo is working fine with both
Same here. Its noise sometimes and a terrible image other times.
if you're getting totally unusable images, it's because of the buggy v-objective model detection I mentioned above, which is fixed on uservar/dev2 now.
Thanks. Is uservar/dev2 the one to run locally? Also this is a newb question but how do I clone/pull/merge/overwrite the carlkenner branch without having to redownload all the dependencies and such
if you're getting totally unusable images, it's because of the buggy v-objective model detection I mentioned above, which is fixed on uservar/dev2 now.
thanks, uservar's version work much better, just switched to that. wonder when some of these will be merged by @AUTOMATIC1111 to the main repo?
is it possible to fix upscaler (x4-upscaler-ema.ckpt) to work in img2img ? now it throws error:
File "/home/arpi/stable/GUI/v2/uservar/stable-diffusion-webui/modules/sd_samplers.py", line 437, in sample_img2img xi = x + noise * sigma_sched[0] RuntimeError: The size of tensor a (128) must match the size of tensor b (64) at non-singleton dimension 3
it's better to create issue in appropriate repo of webui you are using atm.
is it possible to fix upscaler (x4-upscaler-ema.ckpt) to work in img2img ? now it throws error:
As bad as 2.0 is (and yeah that's my opinion) we could do with official support for it in automatic1111 instead of having to use forks.
Just please also support 1.5
If there is a way to use the older clip with 2.0 in stable diffusion (I don't understand that but apparently that will somehow allow use of missing content...yeah really don't understand that) then that would be a bonus to have
@CarlKenner Your branch just gives me mostly noise using 768 weights (& yes I am generating at 768x768) the 512 weights work fine though. uservar's repo is working fine with both
Same here. Its noise sometimes and a terrible image other times.
if you're getting totally unusable images, it's because of the buggy v-objective model detection I mentioned above, which is fixed on uservar/dev2 now.
Yeah that's kinda what I was saying, I think usevar's /dev2 is the way to go atm
As bad as 2.0 is (and yeah that's my opinion) we could do with official support for it in automatic1111 instead of having to use forks.
Just please also support 1.5
If there is a way to use the older clip with 2.0 in stable diffusion (I don't understand that but apparently that will somehow allow use of missing content...yeah really don't understand that) then that would be a bonus to have
I agree, in the settings i would like to switch between 1.5 and 2.0 like the whole thing including clip if possible, not just the checkpoint file or a single aspect of 2.0. i know this may not be possible though and i understand if it isn't. I hope this can be made to happen.
Is someone working on this? Or is there a working PR? I noticed AUTOMATIC1111 has been away for the past 4 days
I'm sure with the 2.0 update everyones just real busy figuring out how to bring this into the WebUI since I've heard it's a pretty breaking update. I'd give it a few days but it sounds like good progress is being made towards a PR from what I can gather.
Thanks. Is uservar/dev2 the one to run locally? Also this is a newb question but how do I clone/pull/merge/overwrite the carlkenner branch without having to redownload all the dependencies and such
The dependencies are the same, as is most of the code, so you can just pull uservar/dev2 on top of the existing branch if you want. Anyway, I just updated my branch with his changes, so you could just pull my branch again.
Is someone working on this?
Lots of people are working on it.
Or is there a working PR?
I figured my code wasn't quite ready for a pull request, so I didn't create one. But currently, my dev2-carl branch or the uservar/dev2 branch is mostly working. Although it does include a few of uservar's changes that aren't directly related to SD 2.0 support, so maybe they should be separated into another branch before anyone makes a pull request.
I don't know whether it's better to merge partial support in as soon as it's ready or wait for full support before anyone does a pull request.
I noticed AUTOMATIC1111 has been away for the past 4 days
I guess there's no rush for us to make a pull request then.
is it possible to fix upscaler (x4-upscaler-ema.ckpt) to work in img2img ?
I'm sure it's possible, I just haven't had a chance to work on the upscaler much. When I first tested generating an image with the upscaler, I got an out of RAM (not VRAM) error.
I feel like we need a better way of determining model architecture that doesn't rely on the filename.
That's currently how Automatic1111 handles determining model architecture for the inpainting model, so it's not new. Another clue we could use for detecting 2.0 models is the size, they all seem to be > 5,000,000,000 bytes but less than 6,000,000,000. That doesn't help narrow down which of the five 2.0 architectures it is though. Or we could look at the hashes. I don't know how to actually load and read the checkpoint file to see what architecture it contains.
It's also possible (in theory, I didn't test it) to put the appropriate .yaml file beside it with the same name.
As bad as 2.0 is (and yeah that's my opinion) we could do with official support for it in automatic1111 instead of having to use forks.
These aren't really forks. There are a few other unrelated bug fixes in the uservar/dev2 branch, but mostly we're just coding support into the latest version of automatic1111 so it can become a pull request.
Just please also support 1.5
It does.
If there is a way to use the older clip with 2.0 in stable diffusion (I don't understand that but apparently that will somehow allow use of missing content...yeah really don't understand that) then that would be a bonus to have
I don't think so. The new model is trained on the outputs of the new clip, if you fed in the old clip's outputs you'd just get randomly shuffled meanings of words.
The better way to access the missing content that is supposedly in the model but not in CLIP would be to train an Aesthetic Gradient (or possibly a Textual Inversion?). Unfortunately, last time I checked, I didn't have enough VRAM to use Aesthetic Gradients, even though 4GB is enough to train them. So someone else might need to get Aesthetic Gradients working in 2.0.
It's also theoretically possible to retrain the text encoder separately from the model itself, I think a Chinese team did that to make a version that understands Chinese but still generates the same images.
But the problem isn't just the new CLIP. Don't get your hopes up that all the styles and celebrities are still fully learned somewhere in the latent space just waiting to be discovered somehow. Maybe they are, but maybe it never learned concepts it doesn't have words for. I don't know.
@CarlKenner What about support for CLIP guidance? So you have the option to use proper CLIP guidance instead of the frozen CLIP models is AUTO1111 UI. Midjourney has proper CLIP guidance, so does DreamStudio & someone on the Stability discord was also running v2.0 with proper ClIP guidance & the results seemed much better. Why can't that be supported?
@CarlKenner My proposals for determining v1 vs v2 models:
./models
or ./models/stable-diffusion
.Solutions dependent on name and size are not future proof and can cause problems later since we already have some 7GB checkpoints for v1.5 and being dependent on naming means people can not rename checkpoints to organize them
Using file hashes. (Will cause hassles with new additions but easy to maintain)
This will be a problem for custom trained models like DreamBooth and finetunes.
We're using Python, so maybe a try except catching block would be a better option. We could then save what arch each model uses, and use that after reloading.
We also need to plan for future models having different architectures as well, so that its easy to add support for them.
@CarlKenner My proposals for determining v1 vs v2 models:
- Separated folders for v1 and v2 under
./models
or./models/stable-diffusion
.- Using file hashes. (Will cause hassles with new additions but easy to maintain)
- Including inference yaml files per checkpoint with the same name to be loaded with checkpoints.
Solutions dependent on name and size are not future proof and can cause problems later since we already have some 7GB checkpoints for v1.5 and being dependent on naming means people can not rename checkpoints to organize them
I'm a fan of the simplest solution, and having a models/stable-diffusion-v1 and models/stable-diffusion-v2 directory (Along with detection/support for the old models dir, with warnings printed informing the user of the change) sounds like the most bulletproof solution at the moment.
Why not just leave it up to the user:
Provide a way for users to classify it. For example, a folder like you said, such as a models/v1
or models/v2
. Or maybe a file extension such as .v1.ckpt
.v2.ckpt
then that can automatically be used.
The same thing happens with vae files, without the vae file present under the same name, the WebUI can use a default one such as none or a particular one, but if one is present then you can optionally use that.
It just provides a lot of flexibility on everyone, this way:
I'd even default it to v2 only on new installations, and keeping it v1 if it's an existing installation to further make the update very seamless.
We are using Python, where its easier to ask for forgiven then it is to ask for permission. If there's no way to know what the model architecture is then we can simply do this:
def load_model(path):
model = None
try:
model = load_model_with_v1(path)
except:
pass
try:
model = load_model_with_v1_inpainting(path)
except:
pass
try:
model = load_model_with_v2_512x512(path)
except:
pass
try:
model = load_model_with_v2_768x768(path)
except:
pass
try:
model = load_model_with_v2_depth(path)
except:
pass
assert model is not None, "Model architecture not recognized"
return model
We don't need to require manually renaming the files or anything like that. Try & Except blocks seem like the most user friendly way to do this, especially for those with very little technical skills.
@ProGamerGov Wouldn't that in some cases load the model to memory then need to flush it and so on until it gets to the bottom of the try/except? What would we do in a future where you have 100+ different types of models? How long would someone need to wait to get a model loaded in such case?
The solution I proposed above doesn't require renaming or moving, it's something optional you can do, and would otherwise work out of the box for existing installations.
A try/catch would really slow things down as it would attempt to load the whole model as v1 and then v2 and onward every single time. Also if they release a v3 later on you'd have to add to the try/catch so it's not flexible to the future and would slow down more.
Another solution would be to allow webui to read the ./models
directory recursively
This way you can group multiple checkpoints with a single inference yaml file
But if so the separate directories solution would be very close to that
The try/catch blocks would only be needed the first time you run the model. We can store the arch, and then use that when loading it.
Its also possible to get the weight names without fully loading the model, so we could simply match the list of weights to their architecture:
model = torch.load(path)
weights = list(model["weights"]) # simple example
model_full = match_weight_names_to_arch(weights)
model_full.load(model)
Separated folders for v1 and v2 under ./models or ./models/stable-diffusion.
I'm a fan of the simplest solution, and having a models/stable-diffusion-v1 and models/stable-diffusion-v2 directory (Along with detection/support for the old models dir, with warnings printed informing the user of the change) sounds like the most bulletproof solution at the moment.
There are more than 2 different model architectures though. A lot more.
I may have accidentally implemented this feature already though, by being bad at python. 😂 Try making a subfolder called "768-v-models" and putting trained 768-v- models in there with random names. Or a subfolder called "v1" and putting v1 models in there.
We are using Python, where its easier to ask for forgiven then it is to ask for permission.
Loading 5GB files isn't easy, and it takes like 10 minutes. And I'm not sure it would even know if it got it wrong.
Also, the current implementation expects to know what model type each model is when it makes the list of models (even though I don't think it uses that information).
But it's an interesting idea for later.
Why not just leave it up to the user
Users are idiots (including me). And would just wonder why Stable Diffusion isn't working.
- Have a default selected version (v1 or v2)
- Any model not known just uses the default one.
That's not a terrible idea though.
Provide a way for users to classify it.
There already is one. Put the appropriate .yaml file next to it with the same name. Currently, that may not work with inpainting models that don't include the word "inpainting" though.
Or maybe a file extension such as .v1.ckpt .v2.ckpt then that can automatically be used.
We're already doing that, just at the start of the filename. eg. you can call your trained model "768-v-Christina Hendricks.ckpt"
The try/catch blocks would only be needed the first time you run the model. We can store the arch, and then use that when loading it.
This solution does work, you can store the model hashes in a lookup file to reference and let the AI determine the version on first run. But:
You solution is good though and does abstract away things from the end user so I think both methods are great ideas
- Stable Diffusion already has a system in place for VAE files and using existing systems people are familiar with translate better to the end user
Speaking of VAEs, I think Stable Diffusion 2.0 comes with some, has anyone experimented with them?
By the way, please don't think I should be in charge of implementing any of these features. I'm not very familiar with how Stable Diffusion works, and barely know how to program in Python. Plus I'm currently sick, and haven't been getting much sleep. So if anyone wants to have a go at implementing or fixing things themselves, be my guest.
Its also possible to get the weight names without fully loading the model, so we could simply match the list of weights to their architecture:
model = torch.load(path) weights = list(model["weights"]) # simple example model_full = match_weight_names_to_arch(weights) model_full.load(model)
I personally think this is the correct approach. Although it does require a bit of reordering the code, since right now the code expects to know what config it's using before loading the weights, not after.
I also think detecting architecture by checkpoint filename is OK as a temporary stopgap solution - I wouldn't hold off on merging 2.0 support just because arch detection by ckpt contents isn't implemented yet.
There are more than 2 different model architectures though. A lot more.
In v1, if it's not pertaining to text to image, then it doesn't go in my models folder. In v2, if you have downloaded models which are not text to image but other types, would it be wise to dump them all in the models folder because this isn't how v1 is or at least to my knowledge.
Users are idiots (including me). And would just wonder why Stable Diffusion isn't working.
Having that solution will work for precisely this case, the idea is to not make the user do anything special on up-date. Everything just works, everything is seamless and requires no tinkering with settings, renaming files, or extra reading.
Put the appropriate .yaml file next to it with the same name.
But yaml files are configuration files, i feel that'd be more complicated than just an optional file name suffix or special directory inside models
We're already doing that, just at the start of the filename.
But the start of the filename is the first thing people look at to see what model their using and it won't be sorted right
- Stable Diffusion already has a system in place for VAE files and using existing systems people are familiar with translate better to the end user
Speaking of VAEs, I think Stable Diffusion 2.0 comes with some, has anyone experimented with them?
SD v2.0 doesn't come with VAEs but there were two new ones released about a month ago & I have been testing v2.0 with them. In my opinion they work slightly better with v2.0 than the default
In v1, if it's not pertaining to text to image, then it doesn't go in my models folder. In v2, if you have downloaded models which are not text to image but other types, would it be wise to dump them all in the models folder because this isn't how v1 is or at least to my knowledge.
All those models I listed are text to image.
Having that solution will work for precisely this case, the idea is to not make the user do anything special on up-date. Everything just works, everything is seamless and requires no tinkering with settings, renaming files, or extra reading.
I don't see how.
But yaml files are configuration files
yaml files are essentially part of the model. Frankly, it would make much more sense for models to be published and distributed with their corresponding yaml file beside them. Anyway, that feature already existed, I didn't add it.
SD v2.0 doesn't come with VAEs but there were two new ones released about a month ago & I have been testing v2.0 with them. In my opinion they work slightly better with v2.0 than the default
Are you sure? There are VAE folders in all the 2.0 models on huggingface.
SD v2.0 doesn't come with VAEs but there were two new ones released about a month ago & I have been testing v2.0 with them. In my opinion they work slightly better with v2.0 than the default
Are you sure? There are VAE folders in all the 2.0 models on huggingface.
They are .bin files in 2.0 VAE folders, can we use .bin files ? The other month old VAEs are .ckpt
I keep getting errors on uservar's webui that's preventing me from even launching the ui:
it gets to this point then throws all this stuff...
making attention of type 'vanilla' with 512 in_channels
Traceback (most recent call last):
File "A:\Desktop\00 AI Images\stable-diffusion-webui\launch.py", line 259, in
diffusers==0.9.0
with Stable Diffusion 2 is live!https://github.com/huggingface/diffusers/releases/tag/v0.9.0
Originally posted by @anton-l in https://github.com/huggingface/diffusers/issues/1388#issuecomment-1327731012
I've noticed that with uservar's repo, sometimes on Google Colab's free tier (Via this notebook, can't remember where I found it) can load the models fine, but other times, it ^Cs, even after reboots. What's weird is that it doesn't always give me trouble running the v2 models, just sometimes. Switching models on the spot seems to ^C most of the time, even switching from the bigger model to the smaller one, which makes very little sense to me.
Same here with uservar. I got it working then after a reboot it stop working with the errors like patrickmac
Same here with uservar. I got it working then after a reboot it stop working with the errors like patrickmac
So I should reboot to double break it... to fix it!
Same here with uservar. I got it working then after a reboot it stop working with the errors like patrickmac
So I should reboot to double break it... to fix it!
I've had luck running it, and if it seems like it's taking longer than usual, stopping it, then rerunning it, without rebooting. Again, strange af.
Same here with uservar. I got it working then after a reboot it stop working with the errors like patrickmac
So I should reboot to double break it... to fix it!
No. Probably just go on a coke binge for a couple days til Auto1111 rises from the ashes as a glorious Phoenix and saves us.
Is there an existing issue for this?
What would your feature do ?
Support the new 768x768 model 2.0 from Stability-AI and all the other new models that just got released.
Proposed workflow
768-v-ema.ckpt
from the listLinks
https://huggingface.co/stabilityai/stable-diffusion-2 https://huggingface.co/stabilityai/stable-diffusion-2-base https://huggingface.co/stabilityai/stable-diffusion-2-depth https://huggingface.co/stabilityai/stable-diffusion-2-inpainting https://huggingface.co/stabilityai/stable-diffusion-x4-upscaler/tree/main
768 model download link on HuggingFace: https://huggingface.co/stabilityai/stable-diffusion-2/blob/main/768-v-ema.ckpt 512 base model download link: https://huggingface.co/stabilityai/stable-diffusion-2-base/blob/main/512-base-ema.ckpt 512 depth model download link: https://huggingface.co/stabilityai/stable-diffusion-2-depth/blob/main/512-depth-ema.ckpt 512 inpainting model download link: https://huggingface.co/stabilityai/stable-diffusion-2-inpainting/blob/main/512-inpainting-ema.ckpt new x4 upscaler download link: https://huggingface.co/stabilityai/stable-diffusion-x4-upscaler/blob/main/x4-upscaler-ema.ckpt
Additional information
Here is the error message you get when trying to load the 768x768 2.0 model with the current release: