HanxunH / Unlearnable-Examples

[ICLR2021] Unlearnable Examples: Making Personal Data Unexploitable
https://hanxunh.github.io/Unlearnable-Examples/
MIT License
150 stars 17 forks source link

Mismatch of the training data augmentation between QuickStart.ipynb and main.py #6

Closed ZhengyuZhao closed 2 years ago

ZhengyuZhao commented 2 years ago

Thanks for the interesting work and the detailed code.

  1. I may have noticed some mismatch of the training data augmentation between QuickStart.ipynb and main.py. Let's denote the clean image as x and the perturbations as noise, and the poisoned image as x'.

In QuickStart.ipynb,

_unlearnable_train_dataset = datasets.CIFAR10(root='../datasets', train=True, download=True, transform=train_transform) perturbnoise = noise.mul(255).clamp(0, 255).permute(0, 2, 3, 1).to('cpu').numpy() unlearnable_train_dataset.data = unlearnable_train_dataset.data.astype(np.float32) for i in range(len(unlearnable_train_dataset)): unlearnable_train_dataset.data[i] += perturb_noise[i] unlearnable_train_dataset.data[i] = np.clip(unlearnable_train_dataset.data[i], a_min=0, amax=255)

, which means that _x' = traintransform(x)+noise.

However, in the main.py, we can see that the input to the training process is the PoisonCIFAR10, which has been defined in line 376 of dataset.py. There, if I understand correctly, the construction of PoisonCIFAR10 by adding CIFAR10 and the perturbation.pt instead follows _x' = traintransform(x+noise)

Could you please confirm if my understanding is correct? If so, which version have you used for generating the results in your paper?

  1. By the way, I also notice that for generating the perturbations via perturbation.py, no train_transform has been used at all, could you please explain why it is the case?
HanxunH commented 2 years ago

Hi,

Thanks for your interest in our work.

To reproduce the results in the paper, please check the *.sh in the scripts folder for each corresponding experiment.

For Q1: I believe both are x'=x+noise. The train_transform applies in __getitem__() function which is called by the DataLoader during iterating. In both dataset.py QuickStart.ipynb, it modifies the self.data in CIFAR10 prior to feeding it to the DataLoader, so they should be the same. That is x'=x+noise, and the applies the augmentations in training loops.

For Q2: I did not consider using augmentations during generating the noise at the start, and it can already generate effective noise. I'm not sure if this improves/degrade the effectiveness. My guess is it would improve, worth a shot.

Best,