how to train a new model on a custom dataset

JunMa11 commented 1 year ago

Dear Stable Diffusion Team,

Thanks for sharing the awesome work!

Would it be possible to provide some guidelines on training a new model on a custom dataset? E.g., how to prepare the dataset, how to start the training, how to set important hyper-parameters...

breadbrowser commented 1 year ago

the dataset they use is public. https://laion.ai/blog/laion-5b/

TingTingin commented 1 year ago

thats not the dataset the used they used a subset of that

We currently provide three checkpoints, sd-v1-1.ckpt, sd-v1-2.ckpt and sd-v1-3.ckpt, which were trained as follows,

sd-v1-1.ckpt: 237k steps at resolution 256x256 on laion2B-en. 194k steps at resolution 512x512 on laion-high-resolution (170M examples from LAION-5B with resolution >= 1024x1024). sd-v1-2.ckpt: Resumed from sd-v1-1.ckpt. 515k steps at resolution 512x512 on "laion-improved-aesthetics" (a subset of laion2B-en, filtered to images with an original size >= 512x512, estimated aesthetics score > 5.0, and an estimated watermark probability < 0.5. The watermark estimate is from the LAION-5B metadata, the aesthetics score is estimated using an improved aesthetics estimator). sd-v1-3.ckpt: Resumed from sd-v1-2.ckpt. 195k steps at resolution 512x512 on "laion-improved-aesthetics" and 10% dropping of the text-conditioning to improve classifier-free guidance sampling. Evaluations with different classifier-free guidance scales (1.5, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0) and 50 PLMS sampling steps show the relative improvements of the checkpoints:

kybercore commented 1 year ago

Here is new research for finetuning diffusion models that includes scripts for stable diffusion

https://github.com/rinongal/textual_inversion/tree/main/configs/stable-diffusion

LunNova commented 1 year ago

Should still be open, as the textual inversion repo helps for finetuning but doesn't explain how to train stable-diffusion from scratch.

breadbrowser commented 1 year ago

thats not the dataset the used they used a subset of that

We currently provide three checkpoints, sd-v1-1.ckpt, sd-v1-2.ckpt and sd-v1-3.ckpt, which were trained as follows, sd-v1-1.ckpt: 237k steps at resolution 256x256 on laion2B-en. 194k steps at resolution 512x512 on laion-high-resolution (170M examples from LAION-5B with resolution >= 1024x1024). sd-v1-2.ckpt: Resumed from sd-v1-1.ckpt. 515k steps at resolution 512x512 on "laion-improved-aesthetics" (a subset of laion2B-en, filtered to images with an original size >= 512x512, estimated aesthetics score > 5.0, and an estimated watermark probability < 0.5. The watermark estimate is from the LAION-5B metadata, the aesthetics score is estimated using an improved aesthetics estimator). sd-v1-3.ckpt: Resumed from sd-v1-2.ckpt. 195k steps at resolution 512x512 on "laion-improved-aesthetics" and 10% dropping of the text-conditioning to improve classifier-free guidance sampling. Evaluations with different classifier-free guidance scales (1.5, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0) and 50 PLMS sampling steps show the relative improvements of the checkpoints:

ok nerd

ProGamerGov commented 1 year ago

Here is new research for finetuning diffusion models that includes scripts for stable diffusion

https://github.com/rinongal/textual_inversion/tree/main/configs/stable-diffusion

That's for finetuning the prompt, and not the model itself.

kybercore commented 1 year ago

I found a guide for actual finetuning! The compute time is actually quite cheap, the example was trained for 10$ on two DL GPUs.

https://www.reddit.com/r/StableDiffusion/comments/xjo16u/guide_fine_tuning_stable_diffusion_by/

Mortyzhou-Shef-BIT commented 1 year ago

Hii Do anyone find the training tutorial？Thank you so much

TingTingin commented 1 year ago

There's a ton of ways at this point you can Google lora or textual inversion

TingTingin commented 1 year ago

You can check here https://github.com/kohya-ss/sd-scripts

CompVis / stable-diffusion

how to train a new model on a custom dataset #28