gatheluck / PaperReading

Notes about papers (in Japanese)
0 stars 0 forks source link

[2019] Deep Neural Rejection against Adversarial Examples #29

Open gatheluck opened 4 years ago

gatheluck commented 4 years ago

論文リンク

https://arxiv.org/abs/1910.00470

公開日(yyyy/mm/dd)

2019/10/01

概要

AEsを検出する手法を提案。異なるネットワークの層で特異な特徴表現をするサンプルを棄却する。この検出手法に対する新しい攻撃手法を提案し評価した。提案手法は既存手法よりも高精度だった。

メモ

TeX

% 2019/10/01
@article{
    sotgiu2019deep,
    title={Deep Neural Rejection against Adversarial Examples},
    author={Angelo Sotgiu and Ambra Demontis and Marco Melis and Battista Biggio and Giorgio Fumera and Xiaoyi Feng and Fabio Roli},
    journal=arXiv # "https://arxiv.org/abs/1910.00470",
    year={2019}
}
gatheluck commented 4 years ago

AEsに対する防御で有効な2つの方向性は、

gatheluck commented 4 years ago

ネットワークの最終層が学習した特徴量のみに依存した棄却は不十分と指摘がなされている。(高次の表現ではAEsとtargetは区別出来ない。)

gatheluck commented 4 years ago

上記の問題を解決するために、Deep Neural Rejection (DNR)を提案。複数の層で特徴量を分析して棄却を行う。

gatheluck commented 4 years ago

類似の棄却手法は、

などがあるが、提案手法は学習時にAEsを生成する必要がないため計算量が少ない。

gatheluck commented 4 years ago

有用な表現(評価方法の重要性)

It is worth remarking here that correctly evaluating a defense mechanism is a crucial point when proposing novel defenses against adversarial examples [2], [17]. The majority of previous work proposing defense methods against adversarial examples has only evaluated such defenses against previous attacks rather than against an ad-hoc attack crafted specifically against the proposed defense (see, e.g., [15], [18], [19] and all the other re-evaluated defenses in [17], [20]). The problem with these black-box and gray-box evaluations in which the attack is essentially unaware of the defense mechanism is that they are overly optimistic. It has indeed been shown afterwards that such defenses can be easily bypassed by simple modifications to the attack algorithm [17], [20], [21]. For instance, many defenses have been found to perform gradient obfuscation, i.e., they learn functions which are harder to optimize for gradient-based attacks; however, they can be easily bypassed by constructing a smoother, differentiable approximation of their function, e.g., via learning a surrogate model [2], [6], [22]–[25] or replacing network layers which obfuscate gradients with smoother mappings [17], [20], [21]. In our case, an attack that is unaware of the defense mechanism may tend to craft adversarial examples in areas of the input space which are assigned to the rejection class; thus, such attacks, as well as previously-proposed ones, may rarely bypass our defense. For this reason, we believe that our adaptive white-box attack, along with the security evaluation methodology adopted in this work, provide another significant contribution to the state of the art related to the problem of properly evaluating defenses against adversarial examples.

gatheluck commented 4 years ago

下記の論文で述べられている評価方法(摂動幅を増やして精度を評価)を採用している。

The corresponding security evaluation curve [2] shows how gracefully the performance decreases while the attack increases in strength, up to the point where the defense reaches zero accuracy.

gatheluck commented 4 years ago

防御が崩壊する点が既存手法よりも大きな摂動であることを示すことも重要。(従来手法よりも頑健。)

Another relevant point is to show that such a breakdown point occurs at a larger perturbation than that exhibited by competing defenses, to show that the proposed defense is more robust than previously-proposed ones.

gatheluck commented 4 years ago

MNISTとCIFAR10で評価実験をしたところ、

gatheluck commented 4 years ago

提案手法の Deep Neural Rejection は classifier の層をいくつか選ぶ。それぞれの層からの出力に対して、RBF SVMを学習。また、それらの出力を統合した結果を更にRBF SVMに学習させる。推定されたクラスのlogitが閾値以下なら棄却。

gatheluck commented 4 years ago

MNISTとCIFAR10で評価。学習データとテストデータの分割は、

We average the results on five different runs. In each run, we consider 10,000 training samples and 1,000 test samples, randomly drawn from the corresponding datasets. The deep neural networks (DNNs) used in our experiments are pre-trained on a separate split of 30,000 and 40,000 training samples, respectively for MNIST and CIFAR10.

gatheluck commented 4 years ago

l2ノルムのwhite box attackに対する精度。

Screen Shot 2020-01-01 at 16 39 10
gatheluck commented 4 years ago

提案手法 (DNR) は既存手法 (NR) よりも高精度。また、DNRを騙すためにはより構造化されされた摂動が必要になることが分かった。

Screen Shot 2020-01-01 at 22 39 25
gatheluck commented 4 years ago

関連研究 学習データの分布外のデータを棄却(AEsのrejectionの評価はしていない)

単一の層の特徴量しか棄却に使用しない

複数の層の特徴量を棄却に使用するが、学習時にAEsの生成を必要とする