SunnierLee / DP-ImaGen

[USENIX Security 2024] PrivImage: Differentially Private Synthetic Image Generation using Diffusion Models with Semantic-Aware Pretraining
MIT License
15 stars 0 forks source link

Impact of Validation Data Structure on Training Phase #3

Open uha225 opened 3 weeks ago

uha225 commented 3 weeks ago

This issue follows from Issue #2 (Impact of Validation Data Structure on Training Phase). The training script train_imagenet_classifier.py assumes that both training and validation images are structured by class (with each class’s images in a dedicated subdirectory). Given that the validation images do not come in labeled directories, would adjusting this manually (i.e., creating labeled folders for each class in val) be necessary for proper functionality?

Additionally, I moved 10% of training images to val for validation per class. Could you provide an estimate on the impact of this approach on the classifier’s accuracy?

Steps Taken:

Prepared the ImageNet dataset according to the README. Ran train_imagenet_classifier.py and encountered issues due to validation data not being organized by label.

SunnierLee commented 3 weeks ago

Hi, we created labeled folders for each class in val set. Taking 10% of training images as the val set should work as well.

SunnierLee commented 3 weeks ago

Besides, we fixed some bugs in the repo!

  1. You need to reinstall opacus: cd xxx/DP-ImaGen pip uninstall opacus pip install -e src/opacus

  2. Please replace your src/SemanticQuery/query_semantics.py, src/PRIVIMAGE+D/runners/train_dpdm_base.py and src/PRIVIMAGE+D/model/layers.py with the latest files in the repo.