[2019] Deep Neural Rejection against Adversarial Examples

gatheluck commented 4 years ago

論文リンク

https://arxiv.org/abs/1910.00470

公開日（yyyy/mm/dd）

2019/10/01

概要

AEsを検出する手法を提案。異なるネットワークの層で特異な特徴表現をするサンプルを棄却する。この検出手法に対する新しい攻撃手法を提案し評価した。提案手法は既存手法よりも高精度だった。

メモ

棄却に使用する特徴量をどこからとってくるかによって精度がどのように変化するかを実験していない。（最後の3層を選んでいる。）
l2の攻撃に対する精度しか報告していないが大丈夫？
base の classifier と end-to-end で学習したら精度は向上する？

TeX

% 2019/10/01
@article{
    sotgiu2019deep,
    title={Deep Neural Rejection against Adversarial Examples},
    author={Angelo Sotgiu and Ambra Demontis and Marco Melis and Battista Biggio and Giorgio Fumera and Xiaoyi Feng and Fabio Roli},
    journal=arXiv # "https://arxiv.org/abs/1910.00470",
    year={2019}
}

gatheluck commented 4 years ago

AEsに対する防御で有効な２つの方向性は、

ロバスト最適化・ゲーム理論を応用する方法（adversarial training）
学習データに対して外れ値を棄却する方向と示唆されている。( Invertible Residual Networks )

gatheluck commented 4 years ago

ネットワークの最終層が学習した特徴量のみに依存した棄却は不十分と指摘がなされている。（高次の表現ではAEsとtargetは区別出来ない。）

Is Deep Learning Safe for Robot Vision? Adversarial Examples against the iCub Humanoid, ICCVW 2016
Towards Open Set Deep Networks, CVPR 2016

gatheluck commented 4 years ago

上記の問題を解決するために、Deep Neural Rejection (DNR)を提案。複数の層で特徴量を分析して棄却を行う。

gatheluck commented 4 years ago

類似の棄却手法は、

Detecting adversarial examples through nonlinear dimensionality reduction, ESANN 2019
Safetynet: Detecting and rejecting adversarial examples robustly, ICCV 2017

などがあるが、提案手法は学習時にAEsを生成する必要がないため計算量が少ない。

gatheluck commented 4 years ago

有用な表現（評価方法の重要性）

It is worth remarking here that correctly evaluating a defense mechanism is a crucial point when proposing novel defenses against adversarial examples [2], [17]. The majority of previous work proposing defense methods against adversarial examples has only evaluated such defenses against previous attacks rather than against an ad-hoc attack crafted specifically against the proposed defense (see, e.g., [15], [18], [19] and all the other re-evaluated defenses in [17], [20]). The problem with these black-box and gray-box evaluations in which the attack is essentially unaware of the defense mechanism is that they are overly optimistic. It has indeed been shown afterwards that such defenses can be easily bypassed by simple modifications to the attack algorithm [17], [20], [21]. For instance, many defenses have been found to perform gradient obfuscation, i.e., they learn functions which are harder to optimize for gradient-based attacks; however, they can be easily bypassed by constructing a smoother, differentiable approximation of their function, e.g., via learning a surrogate model [2], [6], [22]–[25] or replacing network layers which obfuscate gradients with smoother mappings [17], [20], [21]. In our case, an attack that is unaware of the defense mechanism may tend to craft adversarial examples in areas of the input space which are assigned to the rejection class; thus, such attacks, as well as previously-proposed ones, may rarely bypass our defense. For this reason, we believe that our adaptive white-box attack, along with the security evaluation methodology adopted in this work, provide another significant contribution to the state of the art related to the problem of properly evaluating defenses against adversarial examples.

gatheluck commented 4 years ago

下記の論文で述べられている評価方法（摂動幅を増やして精度を評価）を採用している。

Wild patterns: Ten years after the rise of adversarial machine learning, PR 2018
Distillation as a defense to adversarial perturbations against deep neural networks, SP 2016
Security evaluation of pattern classifiers under attack, TKDE 2014

The corresponding security evaluation curve [2] shows how gracefully the performance decreases while the attack increases in strength, up to the point where the defense reaches zero accuracy.

gatheluck commented 4 years ago

防御が崩壊する点が既存手法よりも大きな摂動であることを示すことも重要。（従来手法よりも頑健。）

Another relevant point is to show that such a breakdown point occurs at a larger perturbation than that exhibited by competing defenses, to show that the proposed defense is more robust than previously-proposed ones.

gatheluck commented 4 years ago

MNISTとCIFAR10で評価実験をしたところ、

提案した新しい white-box attack は非常に大きい摂動によってのみ、提案した防御を破ることが出来た。
既存手法よりも大きな摂動でも防御が崩壊しなかった。

gatheluck commented 4 years ago

提案手法の Deep Neural Rejection は classifier の層をいくつか選ぶ。それぞれの層からの出力に対して、RBF SVMを学習。また、それらの出力を統合した結果を更にRBF SVMに学習させる。推定されたクラスのlogitが閾値以下なら棄却。

gatheluck commented 4 years ago

MNISTとCIFAR10で評価。学習データとテストデータの分割は、

We average the results on five different runs. In each run, we consider 10,000 training samples and 1,000 test samples, randomly drawn from the corresponding datasets. The deep neural networks (DNNs) used in our experiments are pre-trained on a separate split of 30,000 and 40,000 training samples, respectively for MNIST and CIFAR10.

gatheluck commented 4 years ago

l2ノルムのwhite box attackに対する精度。

gatheluck commented 4 years ago

提案手法 (DNR) は既存手法 (NR) よりも高精度。また、DNRを騙すためにはより構造化されされた摂動が必要になることが分かった。

gatheluck commented 4 years ago

関連研究 学習データの分布外のデータを棄却（AEsのrejectionの評価はしていない）

Knows when it doesn’t know: Deep abstaining classifiers, ICLR 2019 rejected
Selectivenet: A deep neural network with an integrated reject option, ICML 2019
Selective classification for deep neural networks, NIPS 2017

単一の層の特徴量しか棄却に使用しない

Is deep learning safe for robot vision? Adversarial examples against the iCub humanoid, ICCVW 2017
Towards open set deep networks, CVPR 2016

複数の層の特徴量を棄却に使用するが、学習時にAEsの生成を必要とする

Detecting adversarial examples through nonlinear dimensionality reduction, ESANN 2019
Safetynet: Detecting and rejecting adversarial examples robustly, ICCV 2017
Deep k-nearest neighbors: Towards confident, interpretable and robust deep learning, NIPS 2017
Adversarial examples detection in features distance spaces, ECCV 2018
Max-mahalanobis linear discriminant analysis networks, ICML 2018

gatheluck / PaperReading

[2019] Deep Neural Rejection against Adversarial Examples #29

論文リンク

公開日（yyyy/mm/dd）

概要

メモ

TeX