Why use custom models? Cannot reproduce with torchvision model

ajsanjoaquin commented 3 years ago

Hi, thank you for sharing your code!

Is there any specific reason you chose to build your own models than using the models provided by torchvision?

I am trying to reproduce the results in the quickstart notebook with a clean, default resnet-18 (torchvision.models.resnet18()), leaving every other code unchanged. It generates the error-minimizing noise normally. But on the training stage, it produces accuracies far higher (50%) than reported in the paper and in the notebook (Screenshot below). Also, when using your code to visualize the noise (the cell "Visualize Clean Images, Error-Minimizing Noise, Unlearnable Images"), it produces a black image in-place of the noise.

However, when using your provided Resnet-18 model, I can reproduce your notebook's results, but generating the noise is far slower than using the torchvision resnet-18 (almost 2hrs on yours vs 20 minutes on torch model) .

Inspecting your resnet code, I don't see any specific component that would purposefully slow it down. I did have to remove import mlconfig in your model's init and associated references to it because it seems to not be part of your package, and I was getting an import error otherwise

Here is the training accuracy (on unlearnable train dataset) and test accuracy (on clean test dataset) on a torchvision Resnet-18

HanxunH commented 3 years ago

ResNet Implementation: There is no specific reason for not using torchvision models. The implementation of ResNet is just copy-paste from other projects. See https://github.com/kuangliu/pytorch-cifar/blob/master/models/resnet.py The differences between the two implementations are the kernel size, padding and stride for conv1. The torchvision's model will reduce the h,w after conv1. My guess is torchvision's model is meant for ImageNet with larger resolutions.

Performance issue: The visualizing of the noise is by normalizing it to [0,255] so that it can be displayed, so the black image may mean the noise is all 0s. Have you examined the noise by printing them? I have reproduced the result as you mentioned by replacing torchvision's model. Validation accuracy is around 40%, but unable to produce the black image of the noise. I tested our implementation of ResNet to generate the noise and applies to torchvision.models.resnet18, the validation accuracy is around 25%. For this difference in the implementation, my guess is the hyperparameters for noise generation need an update, i.e. the termination and number of steps for the model.

Time issue: The results shown in the notebook was run with RTX-2080-Ti. It took around 3 minutes per epoch for generating the noise, with total 10 epochs. So around 30 minutes in total. Removing mlconfig has no effect on speed. I do not see any other reason that could cause the slowdown on your machine. Can you provide more details on the hardware? Also, because the torchvision models reduce the h,w resolutions, it makes sense that this model can run faster.

ajsanjoaquin commented 3 years ago

Thank you for the explanation.

For the visualizing noise issue, I discovered that some noises have a minimum value = maximum value, which would cause a division by 0 when running your normalizing code. Interestingly, it did not trigger a division by zero error, although the resulting normalized noise are very close to 0. Handling the minimum value = maximum value case by not dividing by the difference visualizes the noises correctly.

For the performance issue, may I clarify if your validation accuracy is computed on clean images? I can confirm that the noise generated by your Res-18 model also works well on a torchvision Res-18.

For the timing issue, I used Google Colab (Tesla T4), so maybe it just is caused by the specific GPU used. While mlconfig is not a culprit, I suggest that it be removed from the init file to avoid import errors for people cloning your repo, or include it in a requirements file.

HanxunH commented 3 years ago

Thanks for your intrest in our work.

Yes, it is the validation accuracy computed on clean images. There may be slight variations between the noise generation with yours. But the using the torchvision.models.resnet18 to generate the noise is less effective.

Thanks for your suggestion on mlconfig. I will update the repo.

HanxunH / Unlearnable-Examples

Why use custom models? Cannot reproduce with torchvision model #1