csce585-mlsystems / project-athena

This is the course project for CSCE585: ML Systems. Students will build their machine learning systems based on the provided infrastructure --- Athena.
MIT License
13 stars 19 forks source link

Broken attack methods in Athena #16

Open andrewwunderlich opened 4 years ago

andrewwunderlich commented 4 years ago

First things first, I am on Windows OS, and I have successfully created adversarial examples using the FGSM, PGD, and Spatial Transformation methods. I'm fairly confident that I know how to create attacks using the provided framework, so I believe this is a real bug rather than a user error.

Several of Athena's preincluded attacks appear to not do anything to the images. Specifically, I have found that the JSMA and DeepFool attacks have no observable effect on the image, even for attack intensities much higher than the default values. (There may be other broken attacks that I have not tried yet. If I find more I will update this thread.) Additionally I have found that the undefended model's predicted values are identical for these attacks and for the benign examples in every case, which is further evidence that the attack is not doing anything.

For example, this is one of the attack configs from attack-zk-mnist.json

"configs14": { "attack": "jsma", "description": "jsma_theta0.3", "theta": 0.3, "gamma": 0.7 } The chosen values of theta = 0.3 and gamma = 0.7 are higher than the default values of theta = 0.15 and gamma = 0.5, so this attack should really be doing something noticeable to the image and should be tricking the undefended model in at least some cases. However, as you can see, the images look completely unperturbed: image image

I can provide more code snippets if desired but I am not sure what else would be useful at the moment, as it seems that the source of the error is not in my own attack generating script, but rather in one of the deeper methods supplied by the project source code.

MENG2010 commented 4 years ago

under investigation. The baseline AEs were generated using a different toolkit, therefore the values may be different.

cjshearer commented 4 years ago

I have the same problem. I have generated 10 different variations of the spatial transformation, none of which fool the UM and all of which have exactly the same error rate. I too see that the images are not transformed at all. Here is a link to the current commit. Only 5 are shown in the attack config, but 10 have been added to the /data/ folder. What should I do regarding the task1 report?

andrewwunderlich commented 4 years ago

Actually I have been successful with the spatial transformation attack--that one works fine for me. @CJShearer I'm curious to know what values you are using for rotation and translation. Keep in mind you might have to have high numbers to generate errors because CNNs have good spatial invariance properties. I tried attacks with rotations from 10 to 50 degrees and found that >30 degree rotation generated a lot of errors.

cjshearer commented 4 years ago

@andrewwunderlich That's a good point about CNNs having good spatial invariance. If you only got good results above 30 degree rotations, then I suppose it's not surprising I didn't have any luck, as I held degree rotations at a maximum of 30 degrees and restricted translations/rotations to 10% or 20% of the image pixels (based on the loss landscapes on page 5 of this paper).

Here is a screenshot of the values I used for the spatial-transform attack: image

Ultimately, I abandoned spatial-transform in favor of BIM.

andrewwunderlich commented 4 years ago

Yeah I see. I would expect that those attacks wouldn't be very effective in fooling the CNN. In any case, I think you should be able to see if the attack itself is actually working just by observing the images. Regardless of how effective the spatial transformation attack is, I could definitely see that the images were being rotated in the image plots. On the contrary, with JSMA and Deepfool I found that the attacked images were completely identical to the benign images, so it was clear that those attack methods were broken.