carlini / nn_robust_attacks

Robust evasion attacks against neural network to find adversarial examples
BSD 2-Clause "Simplified" License
789 stars 229 forks source link

Possibility of "fixing" the number of pixels to be modified - L0 attack for MNIST #19

Closed VishaalMK closed 6 years ago

VishaalMK commented 6 years ago

I was able to modify your algorithm(l0_attack) to produce an adversarial example(AX) that modifies only a specific number of pixels(say, a budget setting). The results were,

  1. For any arbitrary (input, target) pair the attack was not always successful -> as per intuition, lower the number of pixels(budget setting), lower was the success rate.
  2. Again, fewer the number of pixels modified, higher the per pixel intensity change

My questions are:

  1. Is it possible to achieve a successful AX for any arbitrary(input, target) under any budget setting? (say, like a one-pixel attack) using your L0 algorithm?
  2. If so, how can we make the attack(AX) the strongest under that particular budget setting? e.g. if the budget setting was 784 pixels for MNIST, then I'm assuming we could make the AX stronger by increasing the confidence value.

I'm not sure whether the formulation above makes complete sense(under the constraints of your algorithm). Could you suggest me any pointers towards this, or whether this is even possible? Thanks for your time.

carlini commented 6 years ago
  1. It is in general not true that for every image for every budget there exists an adversarial example. Consider the trivial case where the network is the constant function classifying everything as a 6. No adversarial examples would exist for any digit 6. The one-pixel attack paper, for example, only works something like 70% of the time on CIFAR.

  2. There has been no research in this space to the best of my knowledge. JSMA, this L0 attack, and the one-pixel attack are the only L0 attacks I know of, and they all try to minimize distortion, and not maximize error. These attacks are greedy algorithms, in that if they select the wrong pixel (for JSMA, to add; for this one, to remove) then that choice can never be undone. The simplest thing to try would be to set a budget of number of pixels that can be changed, use the current algorithm to figure out which pixels should be changed, and then remove the max (i.e., set the confidence to +inf) and just solve until it converges on a minimum.

VishaalMK commented 6 years ago

Thank you for your detailed reply.

VishaalMK commented 6 years ago

@carlini : Thanks again for your insightful comment! It helped me design a variant of L_0 attack that takes in a given number of pixels allowed for perturbation, dubbed _Budget-aware C&W L0 attack in our latest work: VectorDefense. We have also acknowledged your help in the paper :)

If time permits, any comments you have would be greatly appreciated. Thanks!