Closed A-LinCui closed 4 years ago
Try two things: 1. increase the batch size; 2. for the first 10-20 epochs, train the network on the clean dataset, then use PGD training.
Sorry, can it be more detailed? For example, how much is the batch size?
It depends on the model and the GPU memory you can access. All of the experiments in the paper are conducted by fulfilling 32 1080TI GPUs. e.g. for RobNet_large, the batch size is 512, and for RobNet_free, the batch size is 320. (The batch size here may be different if you use the model file in this repo, since before submitting the paper, we used the supernet with masks forming RobNet_large and RobNet_free to get performances on Tiny-Imagenet.)
Thank you.
Excuse me, we fail to adversarially train RobNet family on tiny-imagenet from scratch that the accuracy and adversarial accuracy on validation set don't rise at all. Only the learning rate decay strategy is mentioned in your article. Could you please tell me the whole detailed training configuration on tiny-imagenet?