Training instances and concepts simultaneously. Am I doing it wrong?

cosimo commented 1 year ago

Hi!

I've been using these notebooks for a month or two now, and I've noticed that initially I was getting encouraging results, while now, since a couple of weeks, training of SD1.5 model has become much more problematic.

What I'm usually doing is training of both instance pictures (3-5 pics) and concept pictures (~30-50 of them), using both the instance picture upload cell and the concept picture upload cell. I assumed, because this worked in the earlier versions of the fast-stable-diffusion notebook, that the filename of the pictures would determine the word/token that would be associated with the person (f.ex. svxzyk) and the concept (f.ex. MyStyleAvatar).

By "the earlier versions" of the notebooks, I meant the ones that did not have a separate cell for concept images, and they were doing the training based on the 0-100 slider, where for values of 10-20 you were training for a specific person (instance) and ~50-70 you were training a style (concept).

Am I doing this wrong now? I started to suspect something is wrong because I can't generate images based on the style/concept. It's like my style is not even recognized or was not trained. I can see that the instance pictures have been trained and associated with the svxzyk token, but I can't say the same for MyStyleAvatar. Should I run the training separately for instance images and then in a different run, for concept images? If so, what text/token will be associated with the style?

Thanks for any help with this!

BTW, I just sponsored this repo, thanks for this work!

TheLastBen commented 1 year ago

the concept images should not be treated like instance images, put all the images that contain the subject that you want to train in the instance images folder. the concept images help de-overfit the tex_encoder in case it's overfit, avoid them if your model isn't suffering from it.

put all the instance images using the instance images cell
skip the concept cell
unet training 2000
unet learning rate 4e-6 (if instance images >30, otherwise 3e-6)
text_encoder 350 steps
leave the text_encoder learning rate to default.
after testing, resume the unet training for 500-1000 steps until you get the desired results

thanks for the support!

cosimo commented 1 year ago

Hi TheLastBen,

thanks for your quick reply on this!

So what I understand is, even if I'm trying to train the model on both a specific person, and a new style (MyStyleAvatar), I should upload both sets of pictures as instance pictures? I thought the separate "concept" pictures/upload was to train the model on a specific style (van gogh pictures f.ex. or my particular style of avatars I want to generate).

I suppose I can just try that. I'm left with a doubt about the point of the concept images, perhaps they are the regularization pictures that were bundled previously?

TheLastBen commented 1 year ago

concept images act as regularization for the text encoder as they wide the concept range once it's narrowed by the instance images, but in your case you won't really need them. a style and a person should be treated as instances.

cosimo commented 1 year ago

Great stuff, thanks!!

GuganKailasam commented 1 year ago

@TheLastBen

I have a question on the concept image. Please share your view on it.

I understand that it is for de-overfitting and widen the view of the model. Can you give an example on what could be the concept image for a given scenario?

Say, I am training my model to generate an iphone 15. So adding those images as instance. In this case or please take any other example case, what would be the concept images to use?

Also for training instance images, prompt would be the name of the image file. What is the prompt for concept images?

TheLastBen commented 1 year ago

If you're training on a specific object don't use concept images, they don't really help the training.

TheLastBen / fast-stable-diffusion

Training instances and concepts simultaneously. Am I doing it wrong? #1105