PKU-ML / AdvNotRealFeatures

Official Code for reproductivity of the NeurIPS 2023 paper: Adversarial Examples Are Not Real Features
13 stars 2 forks source link

Paper Queries #3

Closed DeepOceanDeep closed 5 months ago

DeepOceanDeep commented 6 months ago

Hey,

I've got a few questions about some points in the paper. When you mention that adversarial examples aren't real features, does that also mean that using FGSM and its optimal perturbations isn't helpful for tasks other than classification?

Also, with non-robust features, starting from noise results in final images that look like noise, but when you feed them into a classifier, they still get classified with high probability. Have you looked into how starting from noise or from an image affects classification confidence?

And what's the difference between generating optimal perturbations with FGSM and creating non-robust features?

Thank you

Charles20021201 commented 5 months ago

Hi, Thanks for reading our work! I may answer your questions as follows:

(1) Do you mean "transferable" by "helpful"? As we show experimentally, the adversarial perturbations hardly transfer between paradigms (Generative v.s. discriminative). I believe FGSM is likely to behave similarly to the PGD attack considered in the paper.

(2) Yes. We observed that non-robust datasets construced from noise (denoted as D_noise) exhibited worse performance than those constructed from natural images (denoted as D_natural). For example, using the CIFAR10 dataset, a ResNet-18 trained on D_noise achieves around 30% lower test accuracy than that trained on D_natural.

(3) First of all, I am afraid that FGSM does NOT bring optimal perturbations since there are many stronger adversarial attacks, e.g., PGD and CW attack. FGSM, which is essentially an optimization algorithm, can be used to find non-features.

If there are more questions, just ask! Charles

DeepOceanDeep commented 5 months ago

Thanks for your thoughts and answers @Charles20021201 Regarding Question3 I mean, is there any difference between optimal perturbation achieved by attacks like FGM, PGD, CW, etc. and non-robust features? I feel these are the same concepts under different name.

Charles20021201 commented 5 months ago

Thanks for your thoughts and answers @Charles20021201 Regarding Question3 I mean, is there any difference between optimal perturbation achieved by attacks like FGM, PGD, CW, etc. and non-robust features? I feel these are the same concepts under different name.

Hi, I would say there are still subtle differences between adversarial perturbations and non-robust features. For example, consider using PGD to attack a robustly trained model. Since the robust model mostly relies on robust features, the resulting adversarial perturbations are more like robust features rather than non-robust ones. But your intuition is mostly correct, given a natural model and some data, adversarial attacks are indeed the most common way to find non-robust features

DeepOceanDeep commented 5 months ago

Thanks very very much @Charles20021201 for your insight and helpful answer. Appreciate it!