Inconsistency with your results of Figure 4&5 in the paper

zoujx96 commented 6 years ago

Hi! I'm trying to reproduce your results of defense performance with different confidence of Carlini’s L2 attack on MNIST & CIFAR10 (Figure 4&5). But I met some issues. For MNIST: I generated about 10000 adversarial samples for each confidence(0.0, 10.0, 20.0, 30.0, 40.0). For total about 50000 samples. I got the following graph. defense_performance_mnist My no_defense curve has a downward trend with a high accuracy at confidence of 0.0. But your no_defense accuracy keeps 0% at any confidence. Since higher confidence yields higher attack success rate, I wonder why you got such a curve. For CIFAR10: I generated about 10000 adversarial samples for each confidence(0.0, 20.0, 40.0, 60.0, 80.0, 100.0). For total about 60000 samples. I got the following graph. defense_performance_cifar Besides the no_defense issue in MNIST, my with_detector curve is also inconsistent with yours. Your curve in the paper shows a upward trend. And my test data shows that nearly 99% of the adversarial samples pass the detector. The detector seems not to work. I've seen your talk in ISSUE 1. So I also used your new autoencoder architecture and a better classifier with an accuracy of 86%. I got the following graph. defense_performance_cifar_better The trends of curves seem not to change. Only a higher beginning accuracy was acquired. To sum up, there are 2 issues: 1) My no_defense curve is inconsistent with yours. 2) The detector seems not to make any difference in my experiment. Could you help me deal with the problem? Thank you very much!

Trevillie commented 6 years ago

Hi @zoujx96

For 1: please read Carlini's paper and MagNet paper again and make sure that you understand: in whitebox attack model, Carlini's attack gets 100% attack success rate on MNIST and CIFAR-10 models even with zero confidence. Therefore, the green line should be flat and remain near zero everywhere. If this is not the case in your experiment, your attack is flawed somehow.

For 2: this doesn't make any sense to me so I won't try to explain.

I don't know how you generated your adversarial examples so I can't say for sure. But here's my guess: in Nick's attack code, the generated attack image has pixel value range of [-0.5, 0.5], but in my defense code, the expected input image has pixel value range of [0, 1]. This mismatch can cause confusion. Please check if this is the cause and report here.

Thanks.

zoujx96 commented 6 years ago

Hi @Trevillie Thank you very much for your reply! I suppose the issue lies in the range of pixel values. I see in Carlini's code the pixel values are in the range of (-0.5,0.5), so I want to make sure. Do you generate Carlini's attack samples using CIFAR10 dataset with (0,1) pixel values, or you generate using (-0.5,0.5) pixel values and then change the adversarial samples to (0,1) pixel values by adding 0.5?

Trevillie commented 6 years ago

@zoujx96 On (0,1) dataset directly.

Trevillie / MagNet

Inconsistency with your results of Figure 4&5 in the paper #3