Impact of Validation Data Structure on Training Phase

uha225 commented 3 weeks ago

This issue follows from Issue #2 (Impact of Validation Data Structure on Training Phase). The training script train_imagenet_classifier.py assumes that both training and validation images are structured by class (with each class’s images in a dedicated subdirectory). Given that the validation images do not come in labeled directories, would adjusting this manually (i.e., creating labeled folders for each class in val) be necessary for proper functionality?

Additionally, I moved 10% of training images to val for validation per class. Could you provide an estimate on the impact of this approach on the classifier’s accuracy?

Steps Taken:

Prepared the ImageNet dataset according to the README. Ran train_imagenet_classifier.py and encountered issues due to validation data not being organized by label.

SunnierLee commented 3 weeks ago

Hi, we created labeled folders for each class in val set. Taking 10% of training images as the val set should work as well.

SunnierLee commented 3 weeks ago

Besides, we fixed some bugs in the repo!

You need to reinstall opacus: cd xxx/DP-ImaGen pip uninstall opacus pip install -e src/opacus
Please replace your src/SemanticQuery/query_semantics.py, src/PRIVIMAGE+D/runners/train_dpdm_base.py and src/PRIVIMAGE+D/model/layers.py with the latest files in the repo.

SunnierLee / DP-ImaGen

Impact of Validation Data Structure on Training Phase #3