SalesforceAIResearch / DiffusionDPO

Code for "Diffusion Model Alignment Using Direct Preference Optimization"
https://arxiv.org/abs/2311.12908
Apache License 2.0
272 stars 24 forks source link

Intro

This is the training code for Diffusion-DPO. The script is adapted from the diffusers library.

Model Checkpoints

The below are initialized with StableDiffusion models and trained as described in the paper (replicable with launchers/ scripts assuming 16 GPUs, scale gradient accumulation accordingly).

StableDiffusion1.5

StableDiffusion-XL-1.0

Use this notebook to compare generations. It also has a sample of automatic quantative evaluation using PickScore.

Setup

pip install -r requirements.txt

Structure

Running the training

Example SD1.5 launch

# from launchers/sd15.sh
export MODEL_NAME="runwayml/stable-diffusion-v1-5"
export DATASET_NAME="yuvalkirstain/pickapic_v2"

# Effective BS will be (N_GPU * train_batch_size * gradient_accumulation_steps)
# Paper used 2048. Training takes ~24 hours / 2000 steps

accelerate launch --mixed_precision="fp16"  train.py \
  --pretrained_model_name_or_path=$MODEL_NAME \
  --dataset_name=$DATASET_NAME \
  --train_batch_size=1 \
  --dataloader_num_workers=16 \
  --gradient_accumulation_steps=1 \
  --max_train_steps=2000 \
  --lr_scheduler="constant_with_warmup" --lr_warmup_steps=500 \
  --learning_rate=1e-8 --scale_lr \
  --cache_dir="/export/share/datasets/vision_language/pick_a_pic_v2/" \
  --checkpointing_steps 500 \
  --beta_dpo 5000 \
   --output_dir="tmp-sd15"

Important Args

General

DPO

Optimizers/learning rates

Data

Citation

@misc{wallace2023diffusion,
      title={Diffusion Model Alignment Using Direct Preference Optimization}, 
      author={Bram Wallace and Meihua Dang and Rafael Rafailov and Linqi Zhou and Aaron Lou and Senthil Purushwalkam and Stefano Ermon and Caiming Xiong and Shafiq Joty and Nikhil Naik},
      year={2023},
      eprint={2311.12908},
      archivePrefix={arXiv},
      primaryClass={cs.CV}