Open JunMa11 opened 1 year ago
the dataset they use is public. https://laion.ai/blog/laion-5b/
thats not the dataset the used they used a subset of that
We currently provide three checkpoints, sd-v1-1.ckpt, sd-v1-2.ckpt and sd-v1-3.ckpt, which were trained as follows,
sd-v1-1.ckpt: 237k steps at resolution 256x256 on laion2B-en. 194k steps at resolution 512x512 on laion-high-resolution (170M examples from LAION-5B with resolution >= 1024x1024). sd-v1-2.ckpt: Resumed from sd-v1-1.ckpt. 515k steps at resolution 512x512 on "laion-improved-aesthetics" (a subset of laion2B-en, filtered to images with an original size >= 512x512, estimated aesthetics score > 5.0, and an estimated watermark probability < 0.5. The watermark estimate is from the LAION-5B metadata, the aesthetics score is estimated using an improved aesthetics estimator). sd-v1-3.ckpt: Resumed from sd-v1-2.ckpt. 195k steps at resolution 512x512 on "laion-improved-aesthetics" and 10% dropping of the text-conditioning to improve classifier-free guidance sampling. Evaluations with different classifier-free guidance scales (1.5, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0) and 50 PLMS sampling steps show the relative improvements of the checkpoints:
Here is new research for finetuning diffusion models that includes scripts for stable diffusion
https://github.com/rinongal/textual_inversion/tree/main/configs/stable-diffusion
Should still be open, as the textual inversion repo helps for finetuning but doesn't explain how to train stable-diffusion from scratch.
thats not the dataset the used they used a subset of that
We currently provide three checkpoints, sd-v1-1.ckpt, sd-v1-2.ckpt and sd-v1-3.ckpt, which were trained as follows, sd-v1-1.ckpt: 237k steps at resolution 256x256 on laion2B-en. 194k steps at resolution 512x512 on laion-high-resolution (170M examples from LAION-5B with resolution >= 1024x1024). sd-v1-2.ckpt: Resumed from sd-v1-1.ckpt. 515k steps at resolution 512x512 on "laion-improved-aesthetics" (a subset of laion2B-en, filtered to images with an original size >= 512x512, estimated aesthetics score > 5.0, and an estimated watermark probability < 0.5. The watermark estimate is from the LAION-5B metadata, the aesthetics score is estimated using an improved aesthetics estimator). sd-v1-3.ckpt: Resumed from sd-v1-2.ckpt. 195k steps at resolution 512x512 on "laion-improved-aesthetics" and 10% dropping of the text-conditioning to improve classifier-free guidance sampling. Evaluations with different classifier-free guidance scales (1.5, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0) and 50 PLMS sampling steps show the relative improvements of the checkpoints:
ok nerd
Here is new research for finetuning diffusion models that includes scripts for stable diffusion
https://github.com/rinongal/textual_inversion/tree/main/configs/stable-diffusion
That's for finetuning the prompt, and not the model itself.
I found a guide for actual finetuning! The compute time is actually quite cheap, the example was trained for 10$ on two DL GPUs.
https://www.reddit.com/r/StableDiffusion/comments/xjo16u/guide_fine_tuning_stable_diffusion_by/
Hii Do anyone find the training tutorial?Thank you so much
There's a ton of ways at this point you can Google lora or textual inversion
You can check here https://github.com/kohya-ss/sd-scripts
Dear Stable Diffusion Team,
Thanks for sharing the awesome work!
Would it be possible to provide some guidelines on training a new model on a custom dataset? E.g., how to prepare the dataset, how to start the training, how to set important hyper-parameters...