Recommended settings to train a style

TheLastBen / fast-stable-diffusion

fast-stable-diffusion + DreamBooth

MIT License

7.51k stars 1.31k forks source link

Recommended settings to train a style #2255

Open erikestany opened 1 year ago

erikestany commented 1 year ago

Hi,

I've been searching, but I've found contradictory information. What are the recommended settings to train a style?

Instance images: what is the optimal number? UNet: how many steps per image? With what learning rate? Text-encoder: How many steps? With what learning rate?

Captions: How does this work? What impact do they have on the result? Concept Images: What function do they have? What should these images look like and what is the optimal number? Text Encoder Concept: How does this work? How many steps? Model: All these values are independent of the model being trained, right?

Thank you so much.

DarkAlchy commented 1 year ago

On an old post from TLB he said to train 10 images, but no more than 20, and use no captions AND no reg images. Works, but not always, as I have had more failures than successes so far.

TheLastBen commented 1 year ago

try my textual inversion notebook on runpod, it's a combination between lora and textual inversion and its perfects for styles

erikestany commented 1 year ago

try my textual inversion notebook on runpod, it's a combination between lora and textual inversion and its perfects for styles

Hi,

I have never used Runpod. Where can I find this notebook? Or information to use it?

I guess Runpod is just for training the model, and then it can be downloaded and used in Colab, right? This process can't all be done with Colab, can it?

With Dreambooth, the best parameters to train a style would be these? Images: 10 / UNet: 150 steps per image (2e-6) / Text Encoder: 350 steps (1e-6)

Are the text encoder steps independent of the number of images?

Thank you so much!

erikestany commented 1 year ago

On an old post from TLB he said to train 10 images, but no more than 20, and use no captions AND no reg images. Works, but not always, as I have had more failures than successes so far.

What is this post?

Which UNet and Text Encoder settings worked best for you?

Thanks.

DarkAlchy commented 1 year ago

https://github.com/TheLastBen/fast-stable-diffusion/discussions/1798

I used all defaults.

TheLastBen commented 1 year ago

@erikestany you don't need colab if you use Runpod, its got all the notebooks https://www.runpod.io/console/gpu-browse?template=runpod-stable-unified

erikestany commented 1 year ago

@erikestany you don't need colab if you use Runpod, its got all the notebooks https://www.runpod.io/console/gpu-browse?template=runpod-stable-unified

I tried your textual inversion notebook on Runpod, but the model doesn't seem to learn anything from the style of the images. What have I done wrong? I used 10 images at 768 to train the v-1.5 model with 100 epochs. I've tried again with 200 epochs and nothing changes.

And on the other hand, with Dreambooth, are the text encoder steps independent of the number of images? Is 350 steps (1e-6) a good value for both 10 and 20 images to train a style?

Thank you so much.

TheLastBen commented 1 year ago

use 512px images for textual inversion and increase the learning rate

DarkAlchy commented 1 year ago

use 512px images for textual inversion and increase the learning rate

Does that go for a 2.1 TI as well?