fra31 / sparse-rs

Sparse-RS: a versatile framework for query-efficient sparse black-box adversarial attacks
https://arxiv.org/abs/2006.12834
MIT License
44 stars 6 forks source link

Possible to change the domain allowed for L_0 attacks since they behave more like L_inf? #3

Closed mbeliaev1 closed 3 years ago

mbeliaev1 commented 3 years ago

Hope you are well. Thanks for your great work on this github repo for your paper on Sparse RS attacks. I am a PhD student at UCSB working with my group, we were using your framework to numerically validate our L-0 robust classifier. Before we were using the pointwise attack, but indeed your implementation is much faster, but I am having one issue.

How can I easily change the domain allowed for your attack?

From the paper, as well as what I see in the code, you are bounding the actual perturbations to be between [0,1]. In the paper it says they are bounded to the domain of the input data, but this is clearly not the case since if I input normalized MNIST data (which has a domain of roughly [-2.8,2.8]) the attacked pixels generated with sparse_rs are all either 0 or 1.

We would like to utilize your framework for our paper, but it seems like the attacks are rather weak if they are bounded this way, since they are basically L_inf bounded as well. Am I missing something in my implementation, or is this expected and I should scale my image domain to be between [0,1] to utilize your attack? Clearly all computer generated L_0 attacks will be L_inf bounded, but I would at least like to control this amount without having to retrain models that work on the [0,1] domain.

fra31 commented 3 years ago

Hi,

glad that you find our attack useful!

The current implementation assumes that the network takes input in [0, 1]^d. I think the easiest way to adapt it to normalized inputs is to include the normalization in the forward pass. An option we used for the ImageNet models is here, and another example here. In this way you create a model which takes input in [0, 1]^d, so that the attack exploits completely the input domain, and then applies internally the normalization.

Let me know if this works!

mbeliaev1 commented 3 years ago

Yes thank you! I was not sure if this was intended, but you cleared it up.

Quick question: It seems like you scale your dateset so that the input domain's MIN/MAX map to 0/1. Is there any reason for this other than to be consistent with other benchmarks? My intuition here is that if you mapped the MIN/MAX to 0/Beta, where Beta<1, the attack would be stronger. Of course the new adversarial images cannot be visualized without clipping, but they are still L_0 bounded.

fra31 commented 3 years ago

We produce inputs in [0, 1]^d so that they belong to the image domain. We use such values since those are quite general, used by many models. In general. the attacks is agnostic of those values: the ideas is to perturb some pixel as much as possible, that is to one of the extremes of the interval [a, b] of the values that pixel can take. In our case a=0 and b=1, but others are possible. Perturbing an image outside the valid domain would generate a stronger attack, since one would remove some constraints, but also inputs which are not supposed to be classified by the model since those are not images.

mbeliaev1 commented 3 years ago

Thanks again! I was able to get what I want out of the framework, the attacks work much better now.

Just FYI, if anyone ever has trouble with this: