gongzhitaao / adversarial-classifier

It turns out that adversarial and clean data are not twins, not at all.
https://arxiv.org/abs/1704.04960
MIT License
19 stars 6 forks source link

Little Doubt about Article #3

Open EmotionalXX opened 4 years ago

EmotionalXX commented 4 years ago

Dear Sir,

I just took my first step in scientific research focused on the algorithms of defense adversarial samples. Recently I read your paper 《Adversarial and Clean Data Are Not Twins》 .I think it's very interesting and your clear and compact declaration inspired me a lot. But I got some doubt about it.

In my opinion,I think beacause the binary classifier has very high false negative,it may tends to recognize them as clean samples.

Maybe I make something wrong for this.could you please deal with my doubt? I would be very appreciated if you can help me.

Thank you very much!

gongzhitaao commented 4 years ago

Hey @EmotionalXX ,

I don't think so. Please take a look at Table 1, the first two columns of f2, x_test and x^{adv(f1)}_test are the scores for the adversarial classifier on clean and adversarial examples. Both figures are almost 100%.

I think by "high false negative", you mean the last two columns of Table 1, which show adversarial examples targeted at f2, the classifier. I'm trying to prove here that the adversarial classifier is robust to a second-round attack. Essentially, it means suppose you want to attack f1, but you know there is a classifier f2 that filters out adversarial examples targeted at f1. You may want to generate adversarial examples that 1) fools f1 2) bypasses f2. But according to the last two columns of table 1, you cannot easily do this.

Hope this helps :smile:

EmotionalXX commented 4 years ago

Dear Sir,

As the table 1 shows,the {Xtest}^adv(f2) is nealy 0,what it explains? I still don't understand it.

Thank you very much!

gongzhitaao commented 4 years ago

Let's use the notation above, i.e.,

  1. f1 is the target classifier,
  2. f2 is the adversarial classifier.

You may want to generate adversarial examples that 1) fools f1 2) bypasses f2.

X_test^adv{f2} is the adversarial examples that fools f1 and successfully bypasses f2. It is approaching zero, which means no adversarial examples on f2 can bypass f2. Please go over the paragraph related to table 1.

EmotionalXX commented 4 years ago

I think it is (X_test^adv{f1})^{advf2} that the adversarial examples that fools f1 and successfully bypasses f2.