Using the tool for general interval of input region

shubhamugare commented 1 year ago

Hello,

I've been trying to use the tool for finding adversarial examples where the perturbation is not uniform as in $L-\infty$ . Suppose I have general LB and UB for the range of values the input can take. My goal is to find Adversarial examples in regions like this.

What I tried: I added a linear layer transformation on top of my network that translates the input from [LB,UB] range to [-1,1] range. I then chose eps=1 in Auto Attack to find an Adversarial example.

This didn't perform that well. For testing purposes, I added this linear transformation layer to a $L_ infty$ problem that I could test with Autoattack.

On an MNIST network where eps=0.1, AutoAttack could find Adversarial examples in 47 of them in normal mode. On the same config, with another additional linear layer (as I mentioned above) where now the search region of the input layer is [-1, 1]^{dim}, it could only find 7 Adversarial examples.

I am curious to know how hard is to solve this problem. Also, why does generating an attack become so difficult for the exact same problem when I add a linear transformation?

Thanks!

ScarlettChan commented 1 year ago

您好，您的邮件已收到!

fra31 commented 1 year ago

Hi,

the attacks expect images in [0, 1]^d, so I'd say you would need to add a linear map [0, 1] -> [LB, UB] (applied on every component on the input) to the forward pass of the network. I'm not sure about your use-case, but in principle one would also need to compute the corresponding point in [0, 1] of the original image in [LB, UB] to use it as reference point for the attack.

If the input is already in [0, 1]^d, as for MNIST, the map should be the identity, then the results should be the same.

Let me know if this helps!

shubhamugare commented 1 year ago

Great! that worked well for me.

Thanks!

fra31 / auto-attack

Using the tool for general interval of input region #97