Has something been changed?

Deexaw commented 1 year ago

Hi, I have been training faces on Realistic Vision model for about a week and results were always good, but today something is wrong, it seems that after loading trained model to stable diffusion it just generates photos based on my instance images and prompts wont work as it was before. I thought that I overtrained it, tried again with 25 images and lesser steps same results, even tried with 10 pics and still the same. I usually train train face with 30-70 photos with 5e6 unet training steps and 1e6 Text Encoder Learning Rate, yesterday I tried with 2e-6 unet and 1e-6 text and results were amazing. Today it just seems broken, even tried to train face over another model, still the same. Anyone had the same problem? Sorry for my eng :)

Deexaw commented 1 year ago

And what is this? First time see it

TheLastBen commented 1 year ago

And what is this? First time see it

That's just the model downloading

TheLastBen commented 1 year ago

Try the latest colab, ghd one you're using might be broken

Omenizer commented 1 year ago

I'm having the same issue today, even with an empty prompt I'm getting only variations of my instance images 🤷🏼‍♂️

Deexaw commented 1 year ago

Try the latest colab, ghd one you're using might be broken

Well, I always used the latest verison, even checked it today. Tried again and same awfull results. Idk...

Bullseye-StableDiffusion commented 1 year ago

Try the latest colab, ghd one you're using might be broken

Hello! The problem is from one of your updates from nowadays. I know 100% that the commit "e59c0fc - fix RAM problem for good, thanks to @Daviljoe193" was working very good. I don't really know what you did after that, but yea, now the model gives only the used images in training (distorted and weird). I think that the solution is very simple. Just revert everything to that last working commit.

Edit: With exactly same settings as before, the models are now a joke.

Omenizer commented 1 year ago

Oh, yes RAM is fine now! Been having fun with merges again 👍

hidecreature commented 1 year ago

I'm having the same issue today, even with an empty prompt I'm getting only variations of my instance images 🤷🏼‍♂️

Yes. I'm using the lastest colab and today I have the same problem, the prompt does not work properly. Only generates rare variations of the instance images.

csilv commented 1 year ago

Also having this issue fwiw.

Bullseye-StableDiffusion commented 1 year ago

As I said, the easiest solution (for Ben) will be to revert everything he did in the last 2 days. I don't know what he did, but even the checkpoint merger doesn't work anymore ("cuda out of memory" error for the same models with which I tried a few days ago when everything worked very well).

order661015 commented 1 year ago

I thought I was losing my mind. Glad it wasn't me, though, I have burnt through a lot of Colab compute thinking it was. Should have checked here first!

Daviljoe193 commented 1 year ago

Since I got mentioned here (I guess), I'll weigh in on this issue.

People say that the last "good" commit was the one where my suggestion for fixing the memory leak was applied. Let's see what happened with those 14 commits from between then and now, commit by commit. Note, I'm NOT a developer here, despite my mention in a few commits, I'm just a clown that happens to look like a developer.

Click to expand, it's a lot

Commit 1

![ep1](https://user-images.githubusercontent.com/67191631/220242189-01d4adb4-f42d-4abb-bf9f-464f5c0064d0.png)

Pretty easy here. The learning rate for UNet training was lowered from 5e-6 for 1000 steps to 2e-6 for 1500 steps. Realistically, this should reduce any weird artifacting, and reduce the likelyhood of overfitting, at the cost of a small increase in training time.

Commit 2-3

![ep2](https://user-images.githubusercontent.com/67191631/220242615-bf1f8597-5649-47aa-8f31-53c799e23a5f.png) ![ep3](https://user-images.githubusercontent.com/67191631/220242653-ef4fdfb0-2f34-4fa8-b475-4b84d75d6e17.png)

Here isn't too hard to explain either. A tarball containing the dependencies for Dreambooth/Automatic1111 was updated to use Python 3.10, instead of Python 3.9, and while the old tarball is still there on Huggingface, there shouldn't be anything here that could break anything.

Commit 4-5

![ep4](https://user-images.githubusercontent.com/67191631/220243171-cdfba06a-5543-4f0d-a48f-386ef3318364.png) ![ep5](https://user-images.githubusercontent.com/67191631/220243177-fa3f9f2f-8eb3-49a4-803c-599465375fb5.png)

So these commits are related, therefore I have to cover them together. First, part of this just surpresses a pointless "warning" that "warns" that there's a cool shiny thing that we could use, but don't need. Second, _appears_ to be (I'm just a guy, not the maintainer, or a Python expert) prefetching a dependency for the webui, and maybe some mild code cleanup, Again, nothing here could break anything.

Commit 6

![ep6](https://user-images.githubusercontent.com/67191631/220243872-839ed092-388c-43ad-a021-40cec786f858.png)

That same prefetch from commit 5, but applied to the dedicated SD notebook.

Commit 7

![ep7](https://user-images.githubusercontent.com/67191631/220244034-6065f62f-a1ad-4dbd-b67f-4725a54f6ac7.png)

More guesswork needed here. It seems that all this change does is tell the webui to not attempt to fetch the stuff that commit 5-6 already have gotten.

Commit 8

![ep8](https://user-images.githubusercontent.com/67191631/220244270-73291f6a-e05a-4e3e-a16a-7a0cc7fd0170.png)

Commit 7, but for Dreambooth.

Commit 9

![ep9](https://user-images.githubusercontent.com/67191631/220244333-5237722f-dcbd-45ac-a368-86bc243cfda1.png)

A fix for an error I could've sworn I've seen on the issues page before, but can't seem to find at the moment. 100% couldn't affect training, though.

Commit 10-12

![ep10](https://user-images.githubusercontent.com/67191631/220244713-ed25ccc1-26fb-4ae0-842b-0ebde50d8c7a.png) ![ep11](https://user-images.githubusercontent.com/67191631/220244758-c3b5ab67-6504-46f4-bee6-af281ffb4ad4.png) ![ep12](https://user-images.githubusercontent.com/67191631/220244769-09acc3f1-4902-4660-bfde-3972dceddc7a.png)

Self explanitory, adds everything needed for ControlNet to work, and it's only applied to the Automatic1111 notebook, not the Dreambooth notebook.

Commit 13

![ep13](https://user-images.githubusercontent.com/67191631/220245247-1e8eaa1a-7033-4af2-9472-bc21b50a0514.png)

Self explanitory, just fixes a problem with resuming training on 768 v2.x models on Dreambooth. Likely not what's causing people trouble here.

Commit 14

![ep14](https://user-images.githubusercontent.com/67191631/220245577-b7b2e01e-dd8a-46a1-bc30-c012ef47c599.png)

Uh, it's a 4 character change for what url to git clone. Nothing to see here.

Not trying to invalidate what people have said here about having trouble with overfitting, since I too have had a ton of trouble getting anything that's not either overfitted or ugly (Though I still don't fully get Dreambooth's settings, and I haven't trained any models in over a month), but nothing of significance has changed during the last 14 commits.

Deexaw commented 1 year ago

So what do you suggest? How do we train over custom models now? If it's always overfitting. I never had such problem since I started to use it. The last time I trained without any problem was 18feb, then on the next day it's all started

iqddd commented 1 year ago

Something was definitely broken.

Resuming training for SD2.1-768px based models was throwing an error.
- Training on SD2.1-512px did not crash, but the result was terrible. The upper 'loss' values varied around 0.5-0.6, which is normal. But the lower 'loss' values were about 1e-4, which is very strange for first training.

ps: I recently ran the training again with the same dataset and parameters. The 'loss' value varies between 1e-1 and 0.7. Looks like the problem is fixed.

Deexaw commented 1 year ago

So I tried it again today with 1e6 text encoder, 10 photos. It was a mess. Tried it now with the same photos and 4e7 and it's better

TheLastBen commented 1 year ago

There is no standard settings for all datasets, you have to find the right settings for your dataset.

Deexaw commented 1 year ago

I think it's something with unet, it became more sensitive. Now I used 23 photos with 650 steps only, even text is 1e6 and it's ok, Usually I would use 2300steps for that..

TheLastBen commented 1 year ago

Yes it became more efficient, so you don't need 4000 steps to train on a single subject

Deexaw commented 1 year ago

Yes it became more efficient, so you don't need 4000 steps to train on a single subject

Oh, If only you said that earlier 😄 Where we can read about those updates?

Bullseye-StableDiffusion commented 1 year ago

I have finally made it to work as before. Check out this notebook: https://github.com/Bullseye-StableDiffusion/fixed/blob/main/fast_DreamBooth_fixed.ipynb Took me some good hours to make it to work, because I'm no coder. Used my logical thinking to revert anything to a stable version. I don't know how to fork a certain commit, so I downloaded the older commit and uploaded on a new repository and modified everything accordingly.

Edit: No more distortions and weird images. Add the name of the .jpg files in the prompts to get more characteristics of your character.

Deexaw commented 1 year ago

Thanks mate! going to test it today

yengalvez commented 1 year ago

I have finally made it to work as before. Check out this notebook: https://github.com/Bullseye-StableDiffusion/fixed/blob/main/fast_DreamBooth_fixed.ipynb Took me some good hours to make it to work, because I'm no coder. Used my logical thinking to revert anything to a stable version. I don't know how to fork a certain commit, so I downloaded the older commit and uploaded on a new repository and modified everything accordingly.

Edit: No more distortions and weird images. Add the name of the .jpg files in the prompts to get more characteristics of your character.

THANKS!!! IT WORKS AMAZING! Finally, I was trying everything. But the new version always deform faces even with low steps.

iqddd commented 1 year ago

Still training SD2.1 give terrible results. 00001-3962062657 What the hell does it look like? Undertraining / overtraining? UNet / text_encoder?

InternalMegaT commented 1 year ago

Hi, I have been training faces on Realistic Vision model for about a week and results were always good, but today something is wrong, it seems that after loading trained model to stable diffusion it just generates photos based on my instance images and prompts wont work as it was before. I thought that I overtrained it, tried again with 25 images and lesser steps same results, even tried with 10 pics and still the same. I usually train train face with 30-70 photos with 5e6 unet training steps and 1e6 Text Encoder Learning Rate, yesterday I tried with 2e-6 unet and 1e-6 text and results were amazing. Today it just seems broken, even tried to train face over another model, still the same. Anyone had the same problem? Sorry for my eng :)

This has been happening to me as well. I trained my model despite working fine a few days ago. These are my results now. This is supposed to be a tiger eating fish in a jungle. 00005-1610653001

00002-1610652998

iqddd commented 1 year ago

Guys, what are your base models? With SD1.5, it probably still works correctly. With SD2.1 (512 or 768), it gets terrible results. Moreover, the larger the dataset size, the worse the resulting model. The picture I gave above is generated by a model trained on 180 photos.

Deexaw commented 1 year ago

Hi, I have been training faces on Realistic Vision model for about a week and results were always good, but today something is wrong, it seems that after loading trained model to stable diffusion it just generates photos based on my instance images and prompts wont work as it was before. I thought that I overtrained it, tried again with 25 images and lesser steps same results, even tried with 10 pics and still the same. I usually train train face with 30-70 photos with 5e6 unet training steps and 1e6 Text Encoder Learning Rate, yesterday I tried with 2e-6 unet and 1e-6 text and results were amazing. Today it just seems broken, even tried to train face over another model, still the same. Anyone had the same problem? Sorry for my eng :)

This has been happening to me as well. I trained my model despite working fine a few days ago. These are my results now. This is supposed to be a tiger eating fish in a jungle.

Just use less unet steps and text, it became more sensetive somehow

iqddd commented 1 year ago

It seems to me that the issue is not about the number of steps. In the last example, I used only 30 steps per image. The result is still terrible.

InternalMegaT commented 1 year ago

Guys, what are your base models? With SD1.5, it probably still works correctly. With SD2.1 (512 or 768), it gets terrible results. Moreover, the larger the dataset size, the worse the resulting model. The picture I gave above is generated by a model trained on 180 photos.

I'm using 2.1 (768) and its getting trash results, however 3 days ago it was perfectly fine with the same database and training.

InternalMegaT commented 1 year ago

Hi, I have been training faces on Realistic Vision model for about a week and results were always good, but today something is wrong, it seems that after loading trained model to stable diffusion it just generates photos based on my instance images and prompts wont work as it was before. I thought that I overtrained it, tried again with 25 images and lesser steps same results, even tried with 10 pics and still the same. I usually train train face with 30-70 photos with 5e6 unet training steps and 1e6 Text Encoder Learning Rate, yesterday I tried with 2e-6 unet and 1e-6 text and results were amazing. Today it just seems broken, even tried to train face over another model, still the same. Anyone had the same problem? Sorry for my eng :)

This has been happening to me as well. I trained my model despite working fine a few days ago. These are my results now. This is supposed to be a tiger eating fish in a jungle.

Just use less unet steps and text, it became more sensetive somehow

I will just rollback a few days I don't want to figure out this new way to train models.

Deexaw commented 1 year ago

Test this colab from @Bullseye-StableDiffusion - https://colab.research.google.com/github/Bullseye-StableDiffusion/fixed/blob/main/fast_DreamBooth_fixed.ipynb Just testing it right now

InternalMegaT commented 1 year ago

Test this colab from @Bullseye-StableDiffusion - https://colab.research.google.com/github/Bullseye-StableDiffusion/fixed/blob/main/fast_DreamBooth_fixed.ipynb Just testing it right now

I'm just using the revision history on the main colab

InternalMegaT commented 1 year ago

Guys, what are your base models? With SD1.5, it probably still works correctly. With SD2.1 (512 or 768), it gets terrible results. Moreover, the larger the dataset size, the worse the resulting model. The picture I gave above is generated by a model trained on 180 photos.

This wont help at all sense my dataset is 6gb in size. Bugs are weird . Update: This is not a bug, but it is an issue.

InternalMegaT commented 1 year ago

Yes it became more efficient, so you don't need 4000 steps to train on a single subject

In my opinion this change is not good. The least you could do is tell us what the settings are now as supposed to then. Is 3000 steps now 300 steps?

TheLastBen commented 1 year ago

will fix soon

InternalMegaT commented 1 year ago

will fix soon

Oh I thought you said this was an intentional feature. Sorry

TheLastBen commented 1 year ago

@InternalMegaT it is supposed to make the training more efficient, but it appears it makes it instable in some cases, so I reversed back a few deps, it should be all good now.

InternalMegaT commented 1 year ago

@InternalMegaT it is supposed to make the training more efficient, but it appears it makes it instable in some cases, so I reversed back a few deps, it should be all good now.

Got it. May I ask, what was it supposed to do?

TheLastBen commented 1 year ago

you can get results with 500 steps instead of 4000

Isaac-DR commented 1 year ago

I have trained faces with the Realistic Vision model.

could you please tell me how do you use this model? when i try to load it i get a conversion error. thx

Omenizer commented 1 year ago

So has there been a change or just new lower defaults?

Bullseye-StableDiffusion commented 1 year ago

I have finally made it to work as before. Check out this notebook: https://github.com/Bullseye-StableDiffusion/fixed/blob/main/fast_DreamBooth_fixed.ipynb Took me some good hours to make it to work, because I'm no coder. Used my logical thinking to revert anything to a stable version. I don't know how to fork a certain commit, so I downloaded the older commit and uploaded on a new repository and modified everything accordingly. Edit: No more distortions and weird images. Add the name of the .jpg files in the prompts to get more characteristics of your character.

THANKS!!! IT WORKS AMAZING! Finally, I was trying everything. But the new version always deform faces even with low steps.

Thanks for confirming! And yes, that problem from the newest version was kinda annoying.

Bullseye-StableDiffusion commented 1 year ago

So has there been a change or just new lower defaults?

It seems that only 2 settings are changed in the latest notebook:

UNet training steps = 1500 now and 500 before
Text Encoder training steps = 350 now and 100 before. I don't really think that this actually solves the problem, but you can try.

InternalMegaT commented 1 year ago

Why are my results so bad still? I'm using the same settings as I did before. Its its not working well anymore. Its not my database thats the problem.

Bullseye-StableDiffusion commented 1 year ago

Why are my results so bad still? I'm using the same settings as I did before. Its its not working well anymore. Its not my database thats the problem.

Try this notebook: https://github.com/Bullseye-StableDiffusion/fixed/blob/main/fast_DreamBooth_fixed.ipynb. Reverted everything as it was like 2-3 days ago and it's working very good with the default settings.

InternalMegaT commented 1 year ago

Why are my results so bad still? I'm using the same settings as I did before. Its its not working well anymore. Its not my database thats the problem.

Try this notebook: https://github.com/Bullseye-StableDiffusion/fixed/blob/main/fast_DreamBooth_fixed.ipynb. Reverted everything as it was like 2-3 days ago and it's working very good with the default settings.

Why are you keep saying that? I already tried that. Still the same thing.

Bullseye-StableDiffusion commented 1 year ago

Why are my results so bad still? I'm using the same settings as I did before. Its its not working well anymore. Its not my database thats the problem.

Try this notebook: https://github.com/Bullseye-StableDiffusion/fixed/blob/main/fast_DreamBooth_fixed.ipynb. Reverted everything as it was like 2-3 days ago and it's working very good with the default settings.

Why are you keep saying that? I already tried that. Still the same thing.

Then I don't know. The problem may be from your training dataset, settings, prompts etc..

InternalMegaT commented 1 year ago

Why are my results so bad still? I'm using the same settings as I did before. Its its not working well anymore. Its not my database thats the problem.

Try this notebook: https://github.com/Bullseye-StableDiffusion/fixed/blob/main/fast_DreamBooth_fixed.ipynb. Reverted everything as it was like 2-3 days ago and it's working very good with the default settings.

Why are you keep saying that? I already tried that. Still the same thing.

Then I don't know. The problem may be from your training dataset, settings, prompts etc..

It worked fine before, I didn't change anything.

TheLastBen commented 1 year ago

@Omenizer I changed the transformers version back to an older one

Omenizer commented 1 year ago

@Omenizer I changed the transformers version back to an older one

Just tried it, much better now! 🥳🥳🥳

Fubu4u2 commented 1 year ago

@Omenizer I changed the transformers version back to an older one

Works great. Thank you for all the time and effort you put into this man.

TheLastBen / fast-stable-diffusion

Has something been changed? #1609