JinseongP / DPTrainer

Official PyTorch implementation of "In-distribution Public Data Synthesis with Diffusion Models for Differentially Private Image Classification", CVPR 2024.
MIT License
3 stars 1 forks source link


Official PyTorch implementation of "In-distribution Public Data Synthesis with Diffusion Models for Differentially Private Image Classification", CVPR 2024.

Jinseong Park *, Yujin Choi *, and Jaewook Lee
* Equal contribution

| paper link |

Step-by-Step Algorithm

  1. Environment configuration
  2. Training EDM models with 4\% data or Download the generated images
  3. Train warm-up classifiers with standard training
  4. DP-SGD

0. Environment configuration

  1. Create docker image (or corresponding virtual environment with cuda 11.8 and torch1.13.0)

    sudo docker run -i -t --ipc=host --name dptrainer--gpus=all anibali/pytorch:1.13.0-cuda11.8 /bin/bash
  2. Install the required packages

    pip install -r requirements.txt

1. Training EDM model with 4\% data

We provide the generated images with the EDM with 4\% of public data in DATADRIVE.

The number of weight in CIFAR-10 indicates the weight of discriminator in DG.

Place the synthetic data and the indices for public data at the directory specified below.

├── data 
│   ├── cifar-10-edm
│   |   ├── cifar10_data_sampled_index.pt
│   |   ├── cifar10_data_sampled_weight0.npz
│   |   ├── ...
│   ├── cifar-100-edm
├── ...

Otherwise, you can end-to-end train EDM models with 4\% data.

0) Follow the requirements of EDM

1) Prepare subsampled dataset

2) Train EDM model

3) Generate EDM samples

4) (Optionally) Discriminator Guidance

2. Warm-up Training

Please refer to the /examples/ folder in this repository.

Follow the instructions of /examples/cifar10_warmup.ipynb and /examples/cifar100_warmup.ipynb.

3. Training EDM model with 4\% data

!python main.py --gpu {GPU} --max_grad_norm {MAX_GRAD_NORM} --epsilon {EPSILON} --delta {DELTA}  --data {DATA} --optimizer "{OPTIMIZER}" --epochs {EPOCHS} --batch_size {BATCH_SIZE} --max_physical_batch_size {MAX_PHYSICAL_BATCH_SIZE} --model_name {MODEL_NAME}  --n_class {N_CLASSES} --augmult {N_AUGMULT} --path {PATH} --name {NAME} --memo {MEMO} --public_batch_size {PUBLIC_BATCH_SIZE} --extender {EXTENDER}  --pretrained_dir {WARMUP_PATH}

For specific usage, follow the instructions of /examples/cifar10_dpsgd.ipynb and /examples/cifar100_dpsgd.ipynb.

For details of each parameter, please refer to main.py.

4. Citation

    author    = {Park, Jinseong and Choi, Yujin and Lee, Jaewook},
    title     = {In-distribution Public Data Synthesis with Diffusion Models for Differentially Private Image Classification},
    booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
    month     = {June},
    year      = {2024},
    pages     = {12236-12246}