IBM / Autozoom-Attack

Codes for reproducing query-efficient black-box attacks in “AutoZOOM: Autoencoder-based Zeroth Order Optimization Method for Attacking Black-box Neural Networks” ​​​​, published at AAAI 2019
https://arxiv.org/abs/1805.11770
Apache License 2.0
57 stars 22 forks source link

Question about the modifiers #4

Closed dongyp13 closed 4 years ago

dongyp13 commented 5 years ago

Hi,

I'm running the code on ImageNet for untargeted attacks. I make sure that the original images are correctly classified. After running attacks for some images, I found that sometimes the next image is already adversarial (initial success occurs at the first iteration).

Is this phenomenon due to that the modifier is not reset after running attack for an image? I'm thinking that the noise remains adversarial for new images because of the effect of "universal perturbation"?

And if it's true, do I need to reset the modifier after attack for each image?

pinyuchen commented 5 years ago

Hello dongyp13,

We don't think modifier is the reason for yielding initial success at the first iteration. In fact, it is not uncommon that after the first iteration one can find an initial success (although it might have large distortion). For example, the well-known fast gradient sign method (FGSM) could find adversarial perturbations using just one iteration.