Training on latest 1.5 VAE seems to overfit compared to 1.4

alejobrainz commented 2 years ago

Hi. I updated the notebook to the latest changes which include using the 1.5 model and the VAE usage improvements, but when I trained a model (tried 1500 and 3000 steps), the resulting inferences tend to overfit to the supplied photos.

I made a prompt to render a character we created, and ised the webui to render a prompt as 3D anime and make it look at member of the rock band KISS. Here are some comparisons:

Character trained on 1.4, 3000 steps, 20 source pictures grid-0004-1025863238-A mid angle camera shot of a ca-whisperer as a Rockstar, based on KISS, with beard, horns, wings, tail, anime 3 D realistic shad

Character trained on 1.5 latest notebook, 3000 steps, 20 same source pictures, same training seed, and same 200 pics training data grid-0002-2790290571-A mid angle camera shot of a ca-whisperer as a Rockstar, based on KISS, with beard, horns, wings, tail, anime 3 D realistic shad

The resulting images have the same cameras and look as the photos supplied, apparently overfitting the model.

So I tried the same prompt with a person (my son), the same procedure as above.

Person trained on 1.4, 3000 steps, 20 source pictures grid-0009-1406060766-A mid angle camera shot of SebastianG2022 as a rockstar from KISS, anime 3 D realistic shaded; by makoto shinkai and jeremy lipk

Same Person trained on 1.5, 3000 steps, 20 same source pictures, same training seed, and same 200 pics training data grid-0010-1394774983-A mid angle camera shot of SebastianG2022 as a rockstar from KISS, anime 3 D realistic shaded; by makoto shinkai and jeremy lipk

Anyone else experiencing this? I have not been able to run dreambooth on latest without not overfitting the model. I tried training on 1500 steps too. Any ideas?

TheLastBen commented 2 years ago

Thanks for the detailed explanation, I will work on that immediately

alejobrainz commented 2 years ago

BTW, Your notebook is absolutely Excellent. A small tip would be to allow you to specify an path to save the rendered class pictures, so we can store them on the drive and save them for future trainings. I've resorted to creating an additional step to copy them, but might be useful for others:

%cp -r "/content/data/$SUBJECT_TYPE" "/content/gdrive/MyDrive/sd/data"

TheLastBen commented 2 years ago

Thanks, I will add that option as a checkbox

TheLastBen commented 2 years ago

Did you check your class images, it seems like the script didn't take in consideration the class pictures. and try low inference steps (30-50)

I tried it and it works well, 1500 steps, 28 instance and 200 class :

studio portrait of emlclrkc, natural colors, beautiful, attractive, natural skin, [[[freckles]]], [[skin pores]], realistic lighting, shot on nikon canon hasselblad, medium shot

download (5)

same seed with the default model :

studio portrait of emilia clarke, natural colors, beautiful, attractive, natural skin, [[[freckles]]], [[skin pores]], realistic lighting, shot on nikon canon hasselblad, medium shot

download (4)

alejobrainz commented 2 years ago

Interesting. Giving it another go including class image creation.

TheLastBen commented 2 years ago

Option to save class_images to gdrive added https://github.com/TheLastBen/fast-stable-diffusion/commit/c1e97d3550e5ad49e67a3935eb72f339be92ae6b

Krolitian commented 2 years ago

Also having an issue with v1.5. My v1.4 model worked perfectly with any style I applied to it, but the v1.5 model that's being created by this Colab no longer gives results beyond just my photorealistic face. I mean to be fair, it did a REALLY good job at replicating my face compared to v1.4, but now can ONLY do that. Can't apply any styles like "cartoon" in the prompt, always just returns me as a real-life picture. Even not putting myself in the prompt gives characters that share my features, and putting in animals results in it putting the animal's head on my body.

Some were thinking I overtrained it at 2k steps, but not only is it what I used for v1.4, but at 1k steps it's definitely undertrained. Not sure what changed to the Colab in the last couple days, but it seems to be causing all my attempts to just completely overwrite the .ckpt with just the training data.

TheLastBen commented 2 years ago

@Krolitian How many steps do you use at inference ? if you keep them blew 50, the style will apply, give it a try, it might by the learning rate changed to 2e-6.

tadashiutau commented 2 years ago

Also having an issue with v1.5. My v1.4 model worked perfectly with any style I applied to it, but the v1.5 model that's being created by this Colab no longer gives results beyond just my photorealistic face. I mean to be fair, it did a REALLY good job at replicating my face compared to v1.4, but now can ONLY do that. Can't apply any styles like "cartoon" in the prompt, always just returns me as a real-life picture. Even not putting myself in the prompt gives characters that share my features, and putting in animals results in it putting the animal's head on my body.

Some were thinking I overtrained it at 2k steps, but not only is it what I used for v1.4, but at 1k steps it's definitely undertrained. Not sure what changed to the Colab in the last couple days, but it seems to be causing all my attempts to just completely overwrite the .ckpt with just the training data.

Apologies beforehand, English isn't my first language.

I'm having the same problem. I've tried multiple subjects, going through my instance images and getting rid of any that aren't good, generating my own class images, training on different seeds, training on 1000, 1250, 1500 and 3000 steps, and no matter what I do, I get photorealistic interpretations of the subject. I can't apply styles or modify the subject in any meaningful way that doesn't look almost exactly to the training data. If I write in a prompt that does not have anything to do with the subject I trained, it'll generate images, so I know the cptk file isn't just the trained subject, I just can't make the trained subject with the rest of the model interact, if that makes sense.

Edit: I'm using the Google Colab.

Krolitian commented 2 years ago

@Krolitian How many steps do you use at inference ? if you keep them blew 50, the style will apply, give it a try, it might by the learning rate changed to 2e-6.

Tried steps from 20 to 150, all same result. Even doing stuff like (((((art by Greg Rutkowski))))) just keeps the photorealism until I add too many ()'s to the point it corrupts. And I tried changing it to 1e-6 as someone suggested and that didn't solve the issue either.

Also having an issue with v1.5. My v1.4 model worked perfectly with any style I applied to it, but the v1.5 model that's being created by this Colab no longer gives results beyond just my photorealistic face. I mean to be fair, it did a REALLY good job at replicating my face compared to v1.4, but now can ONLY do that. Can't apply any styles like "cartoon" in the prompt, always just returns me as a real-life picture. Even not putting myself in the prompt gives characters that share my features, and putting in animals results in it putting the animal's head on my body. Some were thinking I overtrained it at 2k steps, but not only is it what I used for v1.4, but at 1k steps it's definitely undertrained. Not sure what changed to the Colab in the last couple days, but it seems to be causing all my attempts to just completely overwrite the .ckpt with just the training data.

Apologies beforehand, English isn't my first language.

I'm having the same problem. I've tried multiple subjects, going through my instance images and getting rid of any that aren't good, generating my own class images, training on different seeds, training on 1000, 1250, 1500 and 3000 steps, and no matter what I do, I get photorealistic interpretations of the subject. I can't apply styles or modify the subject in any meaningful way that doesn't look almost exactly to the training data. If I write in a prompt that does not have anything to do with the subject I trained, it'll generate images, so I know the cptk file isn't just the trained subject, I just can't make the trained subject with the rest of the model interact, if that makes sense.

Yup, exactly my experience. Just looks like my training data, but it's not. Anything I try to change to it, even without the dreambooth token or class in it still gives a result 90% or more what the token is.

TheLastBen commented 2 years ago

In the prompt don't add the class_name (person, guy, woman ...), use only the instance name (unique identifier). make sure you're using the latest colab notebook

I trained it on 28 instance and autogenerated 200, I couldn't reproduce the problem:

download (1) download (2) download (3) download (6) download (7) eml

giteeeeee commented 2 years ago

Just a side question about learning rate. Is it a setting that you suppose to manually adjust like in textual inversion training? Or has it already been set up in dreambooth/colab and requires no attention?

TheLastBen commented 2 years ago

@giteeeeee there were experiments with various learning rates, and so far the 2e-6 appears to be the most balanced

r8200 commented 2 years ago

I confirm the presence of the problem described above. I did a new training on old photos from which I was already preparing a model. And the result completely erases all styles and stylizations. Checked on today's colab and yes the problem is still there. It is worth adding a word to the prompt, as all styles go away.

I taught the model for 100 subject photos, 500 additional selected photos and 2000 steps, tried with a different number of steps but the result is the same.

A simple promt with styling: Dreambooth man portrait, slightly smiling, disney, pixar, render, renderman, vray, cellsshading, cinematic lights grid-1525

Promt without keyword same model same seed: man portrait, slightly smiling, disney, pixar, render, renderman, vray, cellsshading, cinematic lights grid-1526

The old model trained on October 15 does not have this problem. Trained on the same photos grid-1529

I don’t think that the problem is in 1.5 SD, I had this problem the day before the release of SD

TheLastBen commented 2 years ago

@r8200 in the cell "Start Dreambooth", change the setting "--gradient_accumulation_steps=1 --gradient_checkpointing \" to

--gradient_accumulation_steps=2 --gradient_checkpointing \

and try the same training (it will be slower)

r8200 commented 2 years ago

@r8200 in the cell "Start Dreambooth", change the setting "--gradient_accumulation_steps=1 --gradient_checkpointing " to

--gradient_accumulation_steps=2 --gradient_checkpointing \

and try the same training (it will be slower)

the result is the same except for the time =(

here are some more examples that show the difference

Model : Trained 2000 step, 100/500 photos, v1.5 Promt: Dreambooth epic portrait, 42 years old, by Alex Grey and Lebbeus Woods, psychodelic dream, dispertion colors, vibration grid-1581

Model : Trained 2000 step, 100/500 photos, v1.5 Promt: Man epic portrait, 42 years old, by Alex Grey and Lebbeus Woods, psychodelic dream, dispertion colors, vibration grid-1583

Model : No Trained, v1.5 Promt: Man epic portrait, 42 years old, by Alex Grey and Lebbeus Woods, psychodelic dream, dispertion colors, vibration grid-1584

Model : Trained 3000 step, 100/500 photos, v1.4 - 15 Oct Promt: Dreambooth epic portrait, 42 years old, by Alex Grey and Lebbeus Woods, psychodelic dream, dispertion colors, vibration grid-1587

Model : Trained 3000 step, 100/500 photos, v1.4 - 15 Oct Promt: Man epic portrait, 42 years old, by Alex Grey and Lebbeus Woods, psychodelic dream, dispertion colors, vibration grid-1588

TheLastBen commented 2 years ago

try 30 instance images, and 200 autogenerated class images at 1500 steps, make sure the instance images are 1:1

don't use the word "Dreambooth" in your prompt, I noticed you're not using the instance_name in your prompt, why ?

r8200 commented 2 years ago

I do not use this word, I just entered it so that it would be clear where it should be in the description. The SUBJECT_TYPE I use is "man". I'll try tomorrow with fewer images, but in the old build everything worked with the same content

TheLastBen commented 2 years ago

I strongly believe that the cause of this issue is overtraining, either with the steps or class/instance images, try also using a multi-word subject_type

emlclrkc epic portrait, 42 years old, by Alex Grey and Lebbeus Woods, psychodelic dream, dispertion colors, vibration

download (5)

r8200 commented 2 years ago

Tried with recommended settings, 100 shots, 200 CLASS photo, 2000 steps, fp16 - the result is exactly the same. the second attempt 20 shots 200 CLASS photo, 2000 steps, fp16 - it got a little bit better, but very far from what it used to be.

Test with last model promt with key word grid-1629

promt without grid-1630

In my opinion, the problem now is not in the accuracy of recognition, but in how the weights are redistributed to the model file. Their influence on the model is now redundant.

TheLastBen commented 2 years ago

it's the new option text encoder training that might've changed things, but the results are actually better, keep the steps as low as 1500 and then play with the positive-negative prompts

Krolitian commented 2 years ago

I think I figured out my issue with my latest run on the Shivam colab. Cut my training images from 14 to 8 and it worked perfectly now. (the same issue happened on his colab with 14 images too so not your fault, seems to be an issue with v1.5 and a high amount of training images.) Gonna try again on your colab once Google removes my GPU limit.

Krolitian commented 2 years ago

Turns out I was wrong. Doing exactly what I did on Shivam's colab to yours still results in images that only show photorealistic depictions of me with no ability to change with a prompt. Seems something is still existing in the Fast colab that's causing this.

TheLastBen commented 2 years ago

I will train more models an see if I can reproduce the issue

TheLastBen commented 2 years ago

I tried the leaning rate 1e-6, I'm getting great results, it is imperative to cap the training steps below 1600, the new text encoder method improves greatly the training and there is no need to go too much over 1600 steps, especially for a single person/object, the secret is to not over-train.

I trained a model on 2 persons at the same time with 400 good quality class images (200 for each gender) and 30 instance images for each person, 1600 steps 1e-6 lr :

download (1) download (2) download (3) download (4) download (6) download (7)

mauzus commented 2 years ago

Yeah, I had the same results as others, but once I stopped using the same huge amount of pictures and steps I used to use before, now everything works great, even much better than before.

Used to need 5000 steps and 50 pictures for decent results before, now just 10 pictures and 1500 steps and everything is even much better. 👍

TheLastBen commented 2 years ago

it's because of the feature "--train_text_encoder", it gives much better results with much less effort, but susceptible to overtraining and overfitting because of the old method habit.

r8200 commented 2 years ago

I tried the latest version on default settings, 18 photos for training.

According to the results, there are slight improvements on the return of the style transfer. But unfortunately part of the "style" now looks like a glitch, perhaps this is due to an increase in the accuracy of 1e-6. You can also add to the problems that a very large number of photos turn out to be black and white, or with almost no colors.

Examples, note the chaotic lines in the trained model, they are present on different requests and are not related to the style.

Trained model, keyword query grid-1680

Trained model, query without keyword grid-1694

Not trained model, query without keyword grid-1679

Another one Trained model, keyword query grid-1686

Trained model, query without keyword grid-1687

All these new improvements are certainly interesting, but without a normal style transfer, they are not essential.

Is it possible to use the old version somehow? What does that require?

mauzus commented 2 years ago

Chaotic lines and unsaturated colors are exactly some of the same issues I had when I was overtraining with the new method.

Maybe try using just 10 photos for training and 200 for your class, and then start tweaking from there...

If you still want to go back to the old method, just remove --train_text_encoder and change --learning_rate to 5e-6 in the Start DreamBooth cell. I think that should give you the same old results...

r8200 commented 2 years ago

In my experience, retraining looks different. It is also worth bearing in mind that the training was performed on the default settings (recommended settings). Again, how then to raise the quality, if there is already too much?

And speaking of the old version, it's not about train_text_encoder and sampling accuracy, the problem is complex and did not appear yesterday with the latest improvements. A good solution was at the time of October 15, at which time I have several well-trained models that are free from the problems that we are discussing.

Overtrained examples model

r8200 commented 2 years ago

it's the new option text encoder training that might've changed things, but the results are actually better, keep the steps as low as 1500 and then play with the positive-negative prompts

Experimenting with keyword weights does not give the desired effect. At zero weight, similarity goes away, but style does not return. Strengthening other words does not give enough effect, rather causes glitches and chaos. Negative descriptions lead the result somewhere to the side, again not affecting the return of the desired style.

Now the work of the Dreambooth is more like the work of the Hypernetwork, where both likeness and style go together without the possibility of mixing them well.

mauzus commented 2 years ago

Overtrained examples

Yes, I know very well about those issues of overtrained models, but yesterday I only had the other issues you mentioned, because I was training with too many images and/or too many steps.

Once I started training with just 10 pictures and 1500 steps, everything went back to normal, with much better results than I ever had before.

r8200 commented 2 years ago

I tested the model on a small number of photos. 10/200 photo 1500 steps 1e-6. The problem has not gone anywhere, the character recognition has fallen, all the artifacts remain, the styles still disappear. I would like to finally be heard, it's not about the samples and their number, it's not overtraining, it's more like overweight. Because what is happening has nothing to do with the number of samples and the problem is not in the new 1.5 model, the problem appeared earlier.

You can do a simple test with your model. Promt: <dreambooth> epic portrait, by Alex Gray and Lebbeus Woods, psychodelic dream, dispertion colors, vibration CFG Scale 15

And that's the kind of result you should expect. Model 1.5 without training grid-1709

What happened on the old colab model dated 15 Oct Model 1.4 training, 100/300 photos 6000 steps, accuracy 5e-7 grid-1711

What happens at the current colab model dated 24 Oct Model 1.5 training, 10/200 photos 1500 steps, accuracy 1e-6 (there are more examples with more samples and photos in the discussion above) grid-1705

GuruVirus commented 2 years ago

Thanks for everyone's insight. I thought I was going crazy with my new 1.5 training and mediocre to poor results.

What I think is happening is the training is putting a much higher vector token value on the trained model with 1.5 so using your invoke string has a high default weight. Try adding a ton of weight to your other prompt elements and reducing weight of the primary model. There's a fine balance before it will mesh for me.

When I was doing facial merging, what worked the best for me was this prompt: [myToken:faceToMerge:0.75].

TheLastBen commented 2 years ago

@r8200 I'll experiment more with the learning rate, if it makes a big difference, I will simply add a dropdown menu to choose different learning rates

giteeeeee commented 2 years ago

@TheLastBen

Just a quick question on when has the "--train_text_encoder" been added to this colab notebook? Because I checked on a copy of the notebook (sd1.4) from three days ago, the "--train_text_encoder" is still in the code. But you guys have been saying that it's newly added in 1.5 so I'm a bit confused.

TheLastBen commented 2 years ago

it was added a day before 1.5 was released, there is also the vae which was added after

giteeeeee commented 2 years ago

it was added a day before 1.5 was released, there is also the vae which was added after

@TheLastBen Then is there also a way to disable the vae?

Don't worry if it requires changing the notebook. Just wanted to test things out a bit.

TheLastBen commented 2 years ago

@giteeeeee create a new cell and use this code to download the model without the vae:

import os
from IPython.display import clear_output

Huggingface_Token = "" #@param {type:"string"}
token=Huggingface_Token

if token !="":
  if os.path.exists('/content/stable-diffusion-v1-5'):
    !rm -r /content/stable-diffusion-v1-5   
  clear_output()
  %cd /content/
  clear_output()
  !mkdir /content/stable-diffusion-v1-5
  %cd /content/stable-diffusion-v1-5
  !git init
  !git lfs install --system --skip-repo
  !git remote add -f origin  "https://USER:{token}@huggingface.co/runwayml/stable-diffusion-v1-5"
  !git config core.sparsecheckout true
  !echo -e "feature_extractor\nsafety_checker\nscheduler\ntext_encoder\ntokenizer\nunet\nmodel_index.json" > .git/info/sparse-checkout
  !git pull origin main
  if os.path.exists('/content/stable-diffusion-v1-5/unet/diffusion_pytorch_model.bin'):
    !rm -r /content/stable-diffusion-v1-5/.git
    %cd /content/    
    print('[1;32mDONE !')

r8200 commented 2 years ago

@TheLastBen Tell me how to run the old colab from the version archive, when I try to run it in the cloud, it gives an error. Can the old version be used at all?

Steps: 0% 1/2000 [00:09<5:09:36, 9.29s/it, loss=0.37, lr=5e-6]Traceback (most recent call last): File "/content/diffusers/examples/dreambooth/train_dreambooth.py", line 694, in <module> main() File "/content/diffusers/examples/dreambooth/train_dreambooth.py", line 654, in main if args.save_n_steps >= 200: TypeError: '>=' not supported between instances of 'NoneType' and 'int' Steps: 0% 1/2000 [00:09<5:14:28, 9.44s/it, loss=0.37, lr=5e-6] Traceback (most recent call last): File "/usr/local/bin/accelerate", line 8, in <module> sys.exit(main()) File "/usr/local/lib/python3.7/dist-packages/accelerate/commands/accelerate_cli.py", line 43, in main args.func(args) File "/usr/local/lib/python3.7/dist-packages/accelerate/commands/launch.py", line 837, in launch_command simple_launcher(args) File "/usr/local/lib/python3.7/dist-packages/accelerate/commands/launch.py", line 354, in simple_launcher raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd) subprocess.CalledProcessError: Command '['/usr/bin/python3', '/content/diffusers/examples/dreambooth/train_dreambooth.py', '--pretrained_model_name_or_path=/content/stable-diffusion-v1-4', '--instance_data_dir=/content/temnikova', '--class_data_dir=/content/class_female', '--output_dir=/content/models/elenatemnikova', '--with_prior_preservation', '--prior_loss_weight=1.0', '--instance_prompt=photo of a elenatemnikova person', '--class_prompt=photo of a person', '--seed=1125', '--resolution=512', '--mixed_precision=fp16', '--train_batch_size=1', '--gradient_accumulation_steps=1', '--use_8bit_adam', '--learning_rate=5e-6', '--lr_scheduler=constant', '--lr_warmup_steps=0', '--max_train_steps=2000', '--num_class_images=300']' returned non-zero exit status 1. Something went wrong

TheLastBen commented 2 years ago

you can use the new colab, just remove --train_text_encoder \ in the "start dreambooth cell" and change back the leaning rate to 5e-6

r8200 commented 2 years ago

you can use the new colab, just remove --train_text_encoder \ in the "start dreambooth cell" and change back the leaning rate to 5e-6

I can’t, there are also errors, I just deleted the line about which they wrote Traceback (most recent call last): File "/content/diffusers/examples/dreambooth/train_dreambooth.py", line 694, in <module> main() File "/content/diffusers/examples/dreambooth/train_dreambooth.py", line 521, in main train_dataset, batch_size=args.train_batch_size, shuffle=True, collate_fn=collate_fn File "/usr/local/lib/python3.7/dist-packages/torch/utils/data/dataloader.py", line 353, in __init__ sampler = RandomSampler(dataset, generator=generator) # type: ignore[arg-type] File "/usr/local/lib/python3.7/dist-packages/torch/utils/data/sampler.py", line 108, in __init__ "value, but got num_samples={}".format(self.num_samples)) ValueError: num_samples should be a positive integer value, but got num_samples=0 Traceback (most recent call last): File "/usr/local/bin/accelerate", line 8, in <module> sys.exit(main()) File "/usr/local/lib/python3.7/dist-packages/accelerate/commands/accelerate_cli.py", line 43, in main args.func(args) File "/usr/local/lib/python3.7/dist-packages/accelerate/commands/launch.py", line 837, in launch_command simple_launcher(args) File "/usr/local/lib/python3.7/dist-packages/accelerate/commands/launch.py", line 354, in simple_launcher raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd) subprocess.CalledProcessError: Command '['/usr/bin/python3', '/content/diffusers/examples/dreambooth/train_dreambooth.py', '--image_captions_filename', '--save_starting_step=500', '--save_n_steps=0', '--pretrained_model_name_or_path=/content/stable-diffusion-v1-5', '--instance_data_dir=/content/data/pohui', '--output_dir=/content/models/pohui', '--instance_prompt=', '--seed=96576', '--resolution=512', '--mixed_precision=fp16', '--train_batch_size=1', '--gradient_accumulation_steps=1', '--use_8bit_adam', '--learning_rate=5e-6', '--lr_scheduler=constant', '--center_crop', '--lr_warmup_steps=0', '--max_train_steps=2000']' returned non-zero exit status 1. Something went wrong

UPD there were some glitches, on the second attempt the process went

TheLastBen commented 2 years ago

make sure there are no spaces after the arguments --image_captions_filename \ and so on

r8200 commented 2 years ago

you can use the new colab, just remove --train_text_encoder \ in the "start dreambooth cell" and change back the leaning rate to 5e-6

This method gives the same bad result as yesterday and the day before yesterday. Therefore, I will repeat my question for the third time, how can I run the old version of the colab? Or has one of the components been updated and the work of the old colab is impossible?

alejobrainz commented 2 years ago

I'm experiencing the same problems, unfortunately. Is there a way to use the 1.4 model with the new method?

giteeeeee commented 2 years ago

you can use the new colab, just remove --train_text_encoder \ in the "start dreambooth cell" and change back the leaning rate to 5e-6

This method gives the same bad result as yesterday and the day before yesterday. Therefore, I will repeat my question for the third time, how can I run the old version of the colab? Or has one of the components been updated and the work of the old colab is impossible?

Use a copy of the colab notebook from a day ago (the newest version has a bug that doesn't support the old method using class images).

then change:

Untitled233-2

I've been running the old settings like this without any problem.

I understand your frustration as I've experienced similar issues with the new text encoder thing. But maybe we can be a little less demanding? He's been working hard on the repo and been doing this for free.

Really appreciate your work. @TheLastBen

alejobrainz commented 2 years ago

Agreed. The notebook is absolutely fantastic! @TheLastBen is a superstar! I will give the above a shot. I suppose that change requires using class images and the old workflow, right?

giteeeeee commented 2 years ago

Agreed. The notebook is absolutely fantastic! @TheLastBen is a superstar! I will give the above a shot. I suppose that change requires using class images and the old workflow, right?

Yes that's correct, but with more training steps.

TheLastBen commented 2 years ago

I'm experiencing the same problems, unfortunately. Is there a way to use the 1.4 model with the new method?

Use the this to download the 1.4 model :

import os
from IPython.display import clear_output

Huggingface_Token = "" #@param {type:"string"}
token=Huggingface_Token

if token !="":
  if os.path.exists('/content/stable-diffusion-v1-5'):
    !rm -r /content/stable-diffusion-v1-5   
  clear_output()
  %cd /content/
  clear_output()
  !mkdir /content/stable-diffusion-v1-5
  %cd /content/stable-diffusion-v1-5
  !git init
  !git lfs install --system --skip-repo
  !git remote add -f origin  "https://USER:{token}@huggingface.co/runwayml/stable-diffusion-v1-5"
  !git config core.sparsecheckout true
  !echo -e "feature_extractor\nsafety_checker\nscheduler\ntext_encoder\ntokenizer\nunet\nmodel_index.json" > .git/info/sparse-checkout
  !git pull origin main
  if os.path.exists('/content/stable-diffusion-v1-5/unet/diffusion_pytorch_model.bin'):
    !rm -r /content/stable-diffusion-v1-5/.git
    %cd /content/    
    print('[1;32mDONE !')

TheLastBen / fast-stable-diffusion

Training on latest 1.5 VAE seems to overfit compared to 1.4 #204