Closed rshaojimmy closed 3 years ago
Right. The evaluation code can only report the accuracy under each attack. If you want to calculate the per-example accuracy, you need to save the classification results for each test sample against all attacks, and then count the number of samples that are classified correctly under all attacks.
I see, thanks a lot.
By the way, may I ask how do you choose the training epoch to stop the training without validation set?
We adopted the setting in TRADES (use the 75th epoch checkpoint). The paper "Overfitting in Adversarially Robust Deep Learning" discusses more on this.
I see, thanks. One last question: in the method of adt_expam, I find you calculate mean and std of rand_noise by using the first 3 channels of phi as follows: adv_mean = phi[:, :3, :, :] adv_std = F.softplus(phi[:, 3:, :, :])
But it is hard to say the first 3 channels of phi correspond to the input image, right? So I just try: adv_mean = phi adv_std = F.softplus(phi)
The result degrades a lot. May I further figure it out?
Thanks.
You cannot get adv_mean and adv_std with a single output phi because they should not be correlated. It's natural to use the first three output channels to get adv_mean and the last three channels to get adv_std since they will depend on different weights in the output layer. This way of getting adv_mean and adv_std is similar to training variational auto-encoders.
Got it, thanks a lot. I may close this issue.
First of all, I would like to thank you for this incredible work!
Seems that you only present the robustness evaluation per attack in the testing. May I know how do you calcualte the per-example accuracy (_Arob in the paper)?
Thanks.