MadryLab / robustness

A library for experimenting with, training and evaluating neural networks, with a focus on adversarial robustness.
MIT License
903 stars 181 forks source link

Robust SqueezeNet for ImageNet #89

Open GlebSBrykin opened 3 years ago

GlebSBrykin commented 3 years ago

At the moment, I have managed to run your library on my computer with a GPU and I would like to train a robust SqueezeNet 1.1 on the Imagenet dataset. But I ran into a problem: Imagenet is no longer available for download. I managed to download the validation and training part from academictorrents, but I couldn't find devkit archive. Please upload this archive here, it is only 2.5 MB. Without this, it is impossible to start training...☹️

dtsip commented 3 years ago

As far as I know, ImageNet is still available for download (under certain terms and conditions) from here.

That being said, I do not fully understand why the training and validation images are not enough to train a model.

GlebSBrykin commented 3 years ago

The fact is that in this file, as I understand it, there are annotations for the files of the validation part of the dataset. That is, it is impossible to test the model without it. But the problem, apparently, was solved. robustness does not use torchvision.datasets.ImageNet. In general, I am now preparing datasets for robustness.

GlebSBrykin commented 3 years ago

Well. I managed to prepare the dataset and run the training through the robustness CLI. But there is a problem. The maximum size of the batch is 4. And this is on SqueezeNet 1.1. 8 GB of RAM and 3 GB of video memory. Why such a large memory consumption? Is there any way to fix this? 1 epoch on ImageNet requires 24 hours and it would be nice to increase the size of the batch.

dtsip commented 3 years ago

It is possible to measure the number of model parameters and the number of activations produced during the forward and backward passes to directly see what is consuming memory.

Unfortunately, we do not have the capacity to investigate this, especially since it is not as issue directly related to the library, but rather standard DNN training.

GlebSBrykin commented 3 years ago

Okay, then I'll rephrase the question a little. Does robust learning put an extra load on memory compared to regular learning?

dtsip commented 3 years ago

Nope. Robust learning requires additional passes through the model, but the memory footprint is essentially the same. (When using use_best for PGD there will be an additional copy of the input stored, but this is tiny compared to the size of saving the model activations, which is where the real memory consumption happens.)

GlebSBrykin commented 3 years ago

Good. Then what settings would you recommend me to use? I mean learning_rate, etc. I will teach on the standard ImageNet.

GlebSBrykin commented 3 years ago

And what's more, is it possible to compute different parts of the network on different devices? For example, for VGG19. The convolutional part is calculated on the GPU, and the fully connected part is calculated on the CPU.

dtsip commented 3 years ago

We typically train robust models with the same parameters as their standard version. So I would start by using the parameters used for a standard SqueezeNet on ImageNet.

Yes, it should be possible to use different devices. It might require modifying that training code a bit though since this is not a typical use-case.

GlebSBrykin commented 3 years ago

So, I keep trying to train the robust SqueezeNet. Decided to use RestrictedImageNet due to limited resources. I have a problem: Robust SqueezeNet1_1 seems to behave incorrectly. This is expressed in the fact that when I try to use it for style transfer, the loss is always equal to nan. And even at the first iteration, before the image update and the optimizer step. Moreover, the code was tested on a regular SqueezeNet from the PyTorch repository and no problems were observed. I do not know how normal it is that the training loss decreases from 1.6000 to 1.5500 per epoch. In my opinion, this is too little. The parameters are as follows: lr = 0.01, attack-lr = 0.05, attack-steps = 7, eps = 3.0, batch-size = 4, constraint = 2. And one more question: is it possible to extract from ImageNet only the data that is used in training on RestrictedImageNet? I'd like to train the model in Google.Colab.