bmaltais / kohya_ss

Apache License 2.0
8.85k stars 1.16k forks source link

[Feature request]: Partial caption dropout #1797

Closed TeKett closed 5 months ago

TeKett commented 6 months ago

Since the caption tells the model that 'this caption should generate the image its associated with', it would be good to be able to do partial dropout on the caption to only remove or keep a number of tokens at random. Since in theory every image should be able to be generated from any prompt, all we are doing is pointing the model in a direction.

"a girl" should be able to generate the same image as "a girl, standing" and "a girl, sitting". This is only possible if the model knows that. Else the former wont generate the same image as the latter two. But this isn't possible without writing multiple captions for the same image.

I tried to caption everything in an image and adding some words for the quality, style, theme, pose, angle, etc, but i then ran into the problem that i needed to use ALL of them to generate my images, if i only used a handfull of tokens when generating then i only got subpar images. If i omit all of them from training, then i cant control them when i generate an image.

So personally id like to be able to to do partial random dropout on the caption. Using a delimiter, like comma, id like to be able to: 'drop out # tokens', 'keep # tokens', and 'keep % of tokens', and if possible a way to modulate it over time.

DKnight54 commented 6 months ago

@TeKett, You are probably gonna groan when you hear this, but GOOD NEWS! This feature already exists in the base SD-Script that this GUI is running on. Bad news? It's not included in the GUI anywhere, so you'd have to put it in manually under the Additional parameters section in the Advanced Configuration section. The majic keyword is --caption_tag_dropout_rate = X where X = 0~1 number indicating chance of a comma separated tag dropping out. ie, 0.1 = 10% chance 1 would be 100% chance of tags dropping out, so play around until you find a value that gives you that flexibility you are looking for.

image (Screenshot cribbed from BMaltais's youtube video about training loras.)

DKnight54 commented 6 months ago

It's possibly not as fine controlled as you seem to be looking for, but should be sufficient to get most of the results you are looking for. And you'd probably want to look at --keep_tokens = X where X is the number of tokens from the front to keep and never drop in case there are specific keywords you want to keep constant.

TeKett commented 6 months ago

Cool, i'll try it out

TeKett commented 6 months ago

It's possibly not as fine controlled as you seem to be looking for, but should be sufficient to get most of the results you are looking for. And you'd probably want to look at --keep_tokens = X where X is the number of tokens from the front to keep and never drop in case there are specific keywords you want to keep constant.

How do i know if its working?

DKnight54 commented 6 months ago

If you meant whether or not the tags are being dropped? Unfortunately afaik by default kohya's sd-script doesn't expose that info. Though I've glance through the code and xan assure you the tag dropping function does exist.

There is a "--debug_dataset" command line arguement that I think will expose what's being trained, but it seems like you may have to manually confirm to continue training. Not sure myself as I have never tried it before.

Otherwise, only by generating images and seeing how flexible it is with regards to the tags.