Fine-tuning of Stable Diffusion models
Run Dreambooth or Low-rank Adaptation (LoRA) from the same notebook:
$~$
Tested with Tesla T4 and A100 GPUs on Google Colab (some settings will not work on T4 due to limited memory)
Tested with Stable Diffusion v1-5 and Stable Diffusion v2-base.
This notebook borrows elements from ShivamShrirao's implementation, but is distinguished by some features:
- Based on main Hugging Face Diffusers🧨 so it's easy to stay up-to-date
- Low-rank Adaptation (LoRA) for faster and more efficient fine-tuning (using cloneofsimo's implementation)
- Data augmentation such as random cropping, flipping and resizing, which can minimize manually prepping and cropping images in certain cases (e.g., training a style)
- More parameters for experimentation (modify LoRA rank approximation, ADAM optimizer parameters, cosine_with_restarts learning rate scheduler, etc), all of which are dumped to a json file so you can remember what you did
- Drop some text-conditioning to improve classifier-free guidance sampling (e.g., how SD V1-5 was fine-tuned)
- Image captioning using filenames or associated textfiles
- Training loss and prior class loss are tracked separately (can be visualized using tensorboard)
- Option to generate exponentially-weighted moving average (EMA) weights for the unet
- Inference with trained models uses Diffusers🧨 pipelines, does not rely on any web-apps
$~$
Image comparing Dreambooth and LoRA (more information here):
full-size image here for the pixel-peepers