This is the training code for Diffusion-DPO. The script is adapted from the diffusers library.
The below are initialized with StableDiffusion models and trained as described in the paper (replicable with launchers/ scripts assuming 16 GPUs, scale gradient accumulation accordingly).
Use this notebook to compare generations. It also has a sample of automatic quantative evaluation using PickScore.
pip install -r requirements.txt
launchers/
is examples of running SD1.5 or SDXL trainingutils/
has the scoring models for evaluation or AI feedback (PickScore, HPS, Aesthetics, CLIP)quick_samples.ipynb
is visualizations from a pretrained model vs baselinerequirements.txt
Basic pip requirementstrain.py
Main script, this is pretty bulky at >1000 lines, training loop starts at ~L1000 at this commit (ctrl-F
"for epoch").upload_model_to_hub.py
Uploads a model checkpoint to HF (simple utility, current values are placeholder)Example SD1.5 launch
# from launchers/sd15.sh
export MODEL_NAME="runwayml/stable-diffusion-v1-5"
export DATASET_NAME="yuvalkirstain/pickapic_v2"
# Effective BS will be (N_GPU * train_batch_size * gradient_accumulation_steps)
# Paper used 2048. Training takes ~24 hours / 2000 steps
accelerate launch --mixed_precision="fp16" train.py \
--pretrained_model_name_or_path=$MODEL_NAME \
--dataset_name=$DATASET_NAME \
--train_batch_size=1 \
--dataloader_num_workers=16 \
--gradient_accumulation_steps=1 \
--max_train_steps=2000 \
--lr_scheduler="constant_with_warmup" --lr_warmup_steps=500 \
--learning_rate=1e-8 --scale_lr \
--cache_dir="/export/share/datasets/vision_language/pick_a_pic_v2/" \
--checkpointing_steps 500 \
--beta_dpo 5000 \
--output_dir="tmp-sd15"
--pretrained_model_name_or_path
what model to train/initalize from--output_dir
where to save/log to--seed
training seed (not set by default)--sdxl
run SDXL training--sft
run SFT instead of DPO--beta_dpo
KL-divergence parameter beta for DPO--choice_model
Model for AI feedback (Aesthetics, CLIP, PickScore, HPS)--max_train_steps
How many train steps to take
--gradient_accumulation_steps
--train_batch_size
see above notes in script for actual BS
--checkpointing_steps
how often to save model
--gradient_checkpointing
turned on automatically for SDXL
--learning_rate
--scale_lr
Found this to be very helpful but isn't default in code
--lr_scheduler
Type of LR warmup/decay. Default is linear warmup to constant
--lr_warmup_steps
number of scheduler warmup steps
--use_adafactor
Adafactor over Adam (lower memory, default for SDXL)
--dataset_name
if you want to switch from Pick-a-Pic--cache_dir
where dataset is cached locally (users will want to change this to fit their file system)--resolution
defaults to 512 for non-SDXL, 1024 for SDXL.--random_crop
and --no_hflip
changes data aug--dataloader_num_workers
number of total dataloader workers@misc{wallace2023diffusion,
title={Diffusion Model Alignment Using Direct Preference Optimization},
author={Bram Wallace and Meihua Dang and Rafael Rafailov and Linqi Zhou and Aaron Lou and Senthil Purushwalkam and Stefano Ermon and Caiming Xiong and Shafiq Joty and Nikhil Naik},
year={2023},
eprint={2311.12908},
archivePrefix={arXiv},
primaryClass={cs.CV}