ShivamShrirao / diffusers

🤗 Diffusers: State-of-the-art diffusion models for image and audio generation in PyTorch
https://huggingface.co/docs/diffusers
Apache License 2.0
1.89k stars 505 forks source link

Prompt per image best practice #196

Open rmac85 opened 1 year ago

rmac85 commented 1 year ago

Not exactly a feature request but when I try to submit a blank issue it takes me to the huggingface diffusers issues, want to ask here. Just wondering about this new feature. I found it sort of overfit on my first attempt though I like the idea.

Just wondering if I should include [zwx] in my prompt files, a different one for each or nothing at all?

Also, what impact if any does the initial instance prompt have? Can this now be left blank, is it ignored or is that required as well.? Like a blanket prompt over each individual prompt... Just would like to know the best way to go about things, though I understand some of my own experimenting would help as well.

gleb-akhmerov commented 1 year ago

I would say that the best way to use prompt per image is to describe what you see in each image. For example:

The idea is to disentangle the concept you want to train from the rest of the things in the images, and also to help the model to better understand what each image depicts, how things in the image can interact and so on.

I should also say that the model trains on the whole image, on each part of the prompt. So, it would not only learn to generate the zwx dog, for example, but would also learn to generate the specific park, the couch, the man who played with the dog, etc.

There's nothing special about the zwx dog, but because there are much more examples of it than there are examples of a specific couch, the model will remember and generate the dog much better. And the park, the couch and the man would be more like extras in a movie.

The really interesting things start to happen when you train the model on a larger dataset of images with detailed prompts. For example, if you would use the pokemon-wiki-captions dataset, you would then be able to combine properties of different Pokemon to create new ones.

In regards to tokens like zwx, it is not necessary to use a special token, according to the Diffusers' blog post. But being consistent (e.g. using the same words, the same name each time you refer to the dog) definitely helps.

The "zwx dog" prompt is an example of a simple way to achieve two things:

Our "zwx dog" is like a "dog", but has certain unique features.

The --instance_prompt option is ignored when using prompt per image, you can omit it.

rmac85 commented 1 year ago

Thanks, I've also found that this method works best so far, very sound advice. I've been testing it out over the past week or so since my questions.

A few things I would like to add:

Don't use too many images with different prompts, and from my experience zwx or whatever initializer you use should be in each prompt.

My first attempt I tried it with 40 images and a unique prompt for each without an initializer token, this overfit very quickly before it had a chance to learn and grow some wings. I think it was either that I didn't include the initializer or because I only had one image for each prompt. Maybe I was throwing in too many things at once.

Now my go-to approach is to use less prompts and groups of images rather than just one.

So let's say I wanted to train on a boxing match. I might use 4-20 images of a boxer throwing an uppercut. Then for each of those images the prompt is the same. "Example of a [zwx] boxer throwing an uppercut"

Then to add more, like let's say a jab, I would add more images of a jab and for those change the prompt to "Example of a [zwx] boxer throwing a jab

The instance prompt is still required to start since it is expected, so I just removed the prompt and left [zwx]

ykurilov commented 1 year ago

How to use prompt per image in Shivam? Didn't find it. I mean using captions for each instance image.

rmac85 commented 1 year ago

Add --read_prompts_from_txts \ then add a .txt file for every instance image with the same name. pic1.png - pic1.png.txt

wyang22 commented 1 year ago

You added this argument yourself? can you share more details and how good the result is?