Fine-tuning of Stable Diffusion models

$~$

Tested with Tesla T4 and A100 GPUs on Google Colab (some settings will not work on T4 due to limited memory)

This notebook borrows elements from ShivamShrirao's implementation, but is distinguished by some features:

Based on main Hugging Face Diffusers🧨 so it's easy to stay up-to-date
Low-rank Adaptation (LoRA) for faster and more efficient fine-tuning (using cloneofsimo's implementation)
Data augmentation such as random cropping, flipping and resizing, which can minimize manually prepping and cropping images in certain cases (e.g., training a style)
More parameters for experimentation (modify LoRA rank approximation, ADAM optimizer parameters, cosine_with_restarts learning rate scheduler, etc), all of which are dumped to a json file so you can remember what you did
Drop some text-conditioning to improve classifier-free guidance sampling (e.g., how SD V1-5 was fine-tuned)
Image captioning using filenames or associated textfiles
Training loss and prior class loss are tracked separately (can be visualized using tensorboard)
Option to generate exponentially-weighted moving average (EMA) weights for the unet
Inference with trained models uses Diffusers🧨 pipelines, does not rely on any web-apps

$~$

Image comparing Dreambooth and LoRA (more information here):

brian6091 / Dreambooth