Closed symoon11 closed 4 years ago
It was a matter of implementation convenience and we didn't look into it at the time. Our goal was to get a robust model and given that we test in 'eval' mode our results are unaffected by this. That being said, we did later train models with BN in 'eval' mode and we saw little difference. Conceptually, it should not have a huge impact.
During training, could you explain me why did you use the gradient of a 'train' model to make PGD adversarial examples? It seems unnatural since the batch normalization could hinder generating 'real' adversarial examples. Thanks.