Closed VishaalMK closed 6 years ago
It is in general not true that for every image for every budget there exists an adversarial example. Consider the trivial case where the network is the constant function classifying everything as a 6. No adversarial examples would exist for any digit 6. The one-pixel attack paper, for example, only works something like 70% of the time on CIFAR.
There has been no research in this space to the best of my knowledge. JSMA, this L0 attack, and the one-pixel attack are the only L0 attacks I know of, and they all try to minimize distortion, and not maximize error. These attacks are greedy algorithms, in that if they select the wrong pixel (for JSMA, to add; for this one, to remove) then that choice can never be undone. The simplest thing to try would be to set a budget of number of pixels that can be changed, use the current algorithm to figure out which pixels should be changed, and then remove the max (i.e., set the confidence to +inf) and just solve until it converges on a minimum.
Thank you for your detailed reply.
@carlini : Thanks again for your insightful comment! It helped me design a variant of L_0 attack that takes in a given number of pixels allowed for perturbation, dubbed _Budget-aware C&W L0 attack in our latest work: VectorDefense. We have also acknowledged your help in the paper :)
If time permits, any comments you have would be greatly appreciated. Thanks!
I was able to modify your algorithm(l0_attack) to produce an adversarial example(AX) that modifies only a specific number of pixels(say, a budget setting). The results were,
My questions are:
I'm not sure whether the formulation above makes complete sense(under the constraints of your algorithm). Could you suggest me any pointers towards this, or whether this is even possible? Thanks for your time.