Image out of valid range for the first iteration of PGD attack

MadryLab / cifar10_challenge

A challenge to explore adversarial robustness of neural networks on CIFAR10.

MIT License

488 stars 133 forks source link

Image out of valid range for the first iteration of PGD attack #4

Closed vipinpillai closed 6 years ago

vipinpillai commented 6 years ago

Hi,

I noticed that the image which is fed to the model to obtain the gradients for the first iteration of the PGD attack is not clipped to be in the valid image range.

Here, random noise is added to the original image and the resulting image is directly fed to the network for the first iteration without clipping.

dtsip commented 6 years ago

Oh, you are right. Just fixed it. This doesn't really affect our results since clipping just makes the attack weaker. Thanks for noticing!

vipinpillai commented 6 years ago

@dtsip Actually, it does make it stronger albeit very marginally. Because the image is not in the valid pixel range, the gradients for the first iteration might not be very meaningful. I did verify the attack results both with and without clipping for the adversarially trained model on CIFAR10. The white-box attack accuracy dropped very slightly from 48.16% to 48.09%. I didn't actually verify the numbers across multiple-restarts though.