Guizmus / sd-training-intro

This is a guide that presents how Fine tuning Stable diffusion's models work. This is an entry level guide for newcomers, but also establishes most of the concepts of training in a single place.
64 stars 6 forks source link

Unwanted association. #2

Open leucome opened 1 year ago

leucome commented 1 year ago

I am trying to train a lora with Dreamboth on Automatic1111. It for a character. She has cat ears tail and and an outfit that I want to be included in the final model. Ideally I want it to learn about the autfit and the cat ears too. As a whole character or by seperating element... I new to this to I am not sure what is the most effective approach. There is a video with that specific character here. The one I try to train is the girl on the left in that video. https://www.youtube.com/watch?v=jFNiHZNtRz0

So the character being 3D make it easy to create a set from scratch. So I am pretty confident i can make a good set of picture.

So far it is not working that great. Here a couple of issue I have.

First issue. The final model always end up strongly associated with Purple and Pink color. There is nothing pink and purple in the set. also really few example with these color in the classifier images.

Second issue The model get strongly associated with cat or animal so it often turn the whole character into a cat. Or spawn animal everywhere.

Third issue, Since this model has a relatively big head it also end up associated with the word chibi even though there is no mention of chibi anywhere.

It mean I often get chibi Nendoroid figures of an anthropomorphic cat as a result when using the final lora. It is not the expected behaviors.

On a training with a lot of image in many position it ended making a model who create broken pose all the time. Though the face hairstyle an cloth were pretty looking like the training set.

I made one with separated cloth and less pose. This one has a tendency to generate the cloth part alone.

So is there a way to reduce these unwanted association during the training?

What kind of word and images would help to make it learn the character as a whole with the cloth included.

Does the classifier image are supposed to look really similar of completely different?

EDIT: I think her name may cause the weird purple association. She was called vio then I used viozi in as her name in prompt and token ... Maybe this trigger violet color. There is an other training that is less purple and in this one I called her vilo instead vio.

.... Yeah I can confirm that using the name viozi as Instance Token was most likely causing issue with the color. Then I think maybe the text learning rate was too low to properly learn she was meant to be a women. My last training generate less random animal. It still do but rarely though it also often fail to generate the animal ears. Then it still make chibi sometime but a way less extreme.

leucome commented 1 year ago

Example if I do not use negative purple and pink... These are all example where I try to make her face look as expected while not caring about color.

Screenshot_20230326_052132

There is the stuff I get with generic simple prompt like photo of a girl "mycaracterword"

Screenshot_20230326_054245

Then with other model like chilloumix darelites it is easier I only had to ban purple and it is working. I made these to show to my friend and I asked for beach and dress. But with skirt in the prompt it make cloth in the style similar to my training set. 0028 0009 0004

Here my training set... Screenshot_20230326_054715

Guizmus commented 1 year ago

EDIT: I think her name may cause the weird purple association. She was called vio then I used viozi in as her name in prompt and token ... Maybe this trigger violet color. There is an other training that is less purple and in this one I called her vilo instead vio.

.... Yeah I can confirm that using the name viozi as Instance Token was most likely causing issue with the color. Then I think maybe the text learning rate was too low to properly learn she was meant to be a women. My last training generate less random animal. It still do but rarely though it also often fail to generate the animal ears. Then it still make chibi sometime but a way less extreme.

So, yeah, first thing you discovered here is true : the tokens you'll choose have an impact on how well you will train. Using a token that already means something in the model, means you'll need to train more on it to "overwrite" this previous data. You can test your token quite easily though : just run your tokens in your base model, with nothing more than those. What comes out is "where the model is at" currently for this. So taking a token that doesn't mean anything, or that is already kind of associated with things that are useful to you, this can greatly help.

And since this will depend on the base model, you can have diverse results later on like you tested out, more or less purple or chibi for example.

Personally, unless you want to use that lora with other catgirls in the same prompt, I would train her on "catgirl". it's a token that is already more or less associated with what you want to train, so it should push in the good direction. To be tested on your prefered models beforehand like I just explained.

Guizmus commented 1 year ago

Another thing, regarding your dataset this time : I think it could be too big for the diversity you are able to bring in.

50+ pictures for a character that stays in the same outfit and with the same accessories is a lot too many. Like, you got 7 close ups of the same shoes in different directions. This should be only 1 picture in my opinion, because the only diversity in these 7 pics is the background.

You may want some pictures of your details like those shoes, or hair brush or whatever, but you don't want that to become the focus of your dataset. More focus should be put on pictures that are of the type you want to see come out of your LoRA. So, unless you really want some more close up of her shoes, drop some. This would apply for other categories, like bottom half shots. All in all, I would try to reduce the dataset to around 20, with 15 being of the head, top half of the body, or almost fullbody, and the 5 last focusing on the details in some close shots.

Another point is the backgrounds. for the ones that include the feet/legs/full body, you may want to position her better, so it feels believable. The AI could just think that your character is to be pasted on top of anything without caring if not.

The chibi inspiration, I'm not sure, but I think it comes from the framing of those ultra close head shots you included. The last one for example is really cool framing, but the 3rd on the last line is too close for example

leucome commented 1 year ago

Yes you are right I'll need to try with cartgirl. And maybe also find a token that already create an outfit nearly identical. EDIT: found out punkrave cartgirl... I read the guide a couple of time and figured out that I was using a way to many words in the caption. So I replaced them with shorter one. This helped a lot.

ex

So for the outfit then maybe I'll also try to make with an independent lora. This way I could control their weight on the final picture better.

Guizmus commented 1 year ago

Yes, that feels better already !

separating the clothes to another embedding is a good possibility too yes.

I'm happy this managed to help you a little :)

leucome commented 1 year ago

Then a little follow up... It may be useful if somebody else trying the same kind of training find this post. I managed to get pretty close to the objective of teaching it reproduce the entire character and avoiding weird association. Seriously though the outfit still seem to be the biggest challenge. (I still need to try making it its own lora model) So I did switch to a rare instance token that has no influence, Then used short caption so that it can focus attention only on the character. Here an example viorin,eyes,head,face,outfit,catgirl.
Then I reduced the number of image to 15.

A recurring issue I had is that it was able to learn the face in 1500 to 3000 steps... But It knew very little about her complicated outfit. So to get the outfit right I had to overfit doing 6000 to 9000 step. Then at this point the face was starting to be wrong.

So to fix this I went back to my original idea of adding picture of the cloth/body alone. Though I used full body with no head. Seem that matching the amount of head shot is enough to compensate. Like if we use 5 head shot then adding 5 body seem to keep the training even. With that done then I have a set of 20 pic. I guess that adding picture of legs to compensate for picture of torso would help too. Or maybe adding back a couple of full body pose might help too. I'll have to do more test.

There some example with prompt a photo of viorin: (wlop:0.5) Blending with wlop art help diversify the pose a little. Because I still need to overfit to get the blue flower and the clothing right. so it also tend to replicate the dataset pose when used alone... T11

Still at the current state it would be consistent enough to be used to create something like a visual novel or a comic book or any kind of work that require recurring character.

Other than that my test to strongly associate her with catgirl did not work properly. It made her look like a little girl more than a catgirl.

Also she still sometime turn intoo a chibi. But this one is most-likely unavoidable with this dataset. Like seriously her head is wider than her shoulder, it is probably enough to fall into the chibi category.

I found out that the name viorin instead of viozy don't trigger violet color. The reason was that viozi prompt already generate a purple girl sometime. With some model other than SD 1.5 by example KotosAbyss the prompt viozi alone generate a purple catgirl 100% of the time. So it was really easy for the training to make that association.