facebookresearch / ImageNet-Adversarial-Training

ImageNet classifier with state-of-the-art adversarial robustness
Other
674 stars 87 forks source link

Question on running the evaluation #10

Closed vexilligera closed 5 years ago

vexilligera commented 5 years ago

Hello, thanks for the contribution. I was trying to evaluate the method using the command: python main.py --eval --data ./ILSVRC/Data/CLS-LOC --load ./models/R152-Denoise.npz --attack-iter 100 --attack-epsilon 16.0 where ./ILSVRC/Data/CLS-LOC is the folder containing the validation set of ImageNet. I added a line of code in imagenet_utils.py to print acc1.ratio in real time. However, the top 1 error is around 0.99. Am I misunderstanding the metrics?

vexilligera commented 5 years ago

Sorry, forgot to add --arch ResNet -d 152.

vexilligera commented 5 years ago

I managed to get the top-1 accuracy of ResNet 152 D model to around 19% under white box using a randomized 200-step PGD attack with epsilon=8. But still it's a powerful defense. Let's see if I can go any further...

cihangxie commented 5 years ago

Is it a targeted attack? If yes, that is interesting

ppwwyyxx commented 5 years ago

Is this your own attack method? It would be interesting to know how it works

vexilligera commented 5 years ago

Is it a targeted attack? If yes, that is interesting

After validating on about 9000 samples, the top-1 accuracy is about 14.7% with epsilon = 16. I haven't looked at the success rate as suggested though, I simply added a random noise at each iteration of PGD optimization. I discovered the attack during research against my own defense and other stuff about adversarial examples. I'll probably release a paper when I find more.

vexilligera commented 5 years ago

Is this your own attack method? It would be interesting to know how it works

Here's the one-line modification I made to the standard PGD procedure in adv_model.py to achieve ~80% error rate with epsilon = 8:

def one_step_attack(adv):
        adv = adv + tf.random_uniform(tf.shape(adv), minval=-self.epsilon, maxval=self.epsilon)
         # everything else goes the same
ppwwyyxx commented 5 years ago

Sorry, forgot to add --arch ResNet -d 152.

This may sound silly, but if this is indeed what you're using, it will be wrong and you need --arch ResNetDenoise, as you can find in our INSTURCTIONS.md. With --arch ResNet, it does give about 19% accuracy, because you're loading the wrong model for the architecture.

The extra line you added to the attack is not effective in our local tests. I recommend you to verify first that you're using the correct command line arguments and are able to reproduce the accuracy in our model zoo.

vexilligera commented 5 years ago

Sorry, forgot to add --arch ResNet -d 152.

This may sound silly, but if this is indeed what you're using, it will be wrong and you need --arch ResNetDenoise, as you can find in our INSTURCTIONS.md. With --arch ResNet, it does give about 19% accuracy, because you're loading the wrong model for the architecture.

The extra line you added to the attack is not effective in our local tests. I recommend you to verify first that you're using the correct command line arguments and are able to reproduce the accuracy in our model zoo.

You're right! Very sorry for that! I didn't run the evaluation enough to reach the accuracy because the figrues were pretty high already and I thought it worked. For one second I even thought maybe it's effective. It was funny though, my mistake.

vexilligera commented 5 years ago

Hello everyone, I got more scrupulous after last time, I was pretty silly LOL. Later the day I came up with another attack, but it wasn't really targeted, I just wanted to share with you guys about some preliminary results I'm seeing. To make sure I'm not being too stupid this time, here's the command and I ensured it reached the accuracy in the paper: python main.py --eval --data ./ILSVRC/Data/CLS-LOC --load ./models/R152-Denoise.npz --attack-iter 200 --attack-epsilon 16.0 --batch 25 --arch ResNetDenoise -d 152

Apart from the standard PGD loss, I added an L2 loss term to encourage the latent feature of the adversarial example to be different from the original image, so the loss now looks something like: loss = cross_entropy(logits, label) - L2(dirty_feature, clean_feature) * constant where constant = 0.5, of course I'll look for some better constants if it really works. But as the constant gets larger, the attack gets less targeted.

Currently I'm having top-1 error rate at around 75% but success rate <1%. I'm still playing around.

cihangxie commented 5 years ago

Yes, it is not a targeted attack.

For a targeted version, You may try loss = cross_entropy(logits, target_label) + L2(target_feature, orig_feature)*constant You may refer this paper for more details https://arxiv.org/pdf/1511.05122.pdf

It is also strongly recommend to evaluate your success rate, as it is a (better) measurement for the strength of your target attacks.

vexilligera commented 5 years ago

Yes, it is not a targeted attack.

For a targeted version, You may try loss = cross_entropy(logits, target_label) + L2(target_feature, orig_feature)*constant You may refer this paper for more details https://arxiv.org/pdf/1511.05122.pdf

It is also strongly recommend to evaluate your success rate, as it is a (better) measurement for the strength of your target attacks.

Thanks for your information. The targeted attack looks daunting and I'll probably do some kind of feature embedding to get it work, because intuitively your models should be much less susceptible to such specific perturbations that transform one image to another, so the feature after GAP might work better than the 2048x7x7 feature in this task. Also I'm not sure about the possible outcomes of this targeted attack you mentioned, but I'll take a closer look at that paper and see if things could work out : )