Open gatheluck opened 4 years ago
AEsに対する防御で有効な2つの方向性は、
ネットワークの最終層が学習した特徴量のみに依存した棄却は不十分と指摘がなされている。(高次の表現ではAEsとtargetは区別出来ない。)
上記の問題を解決するために、Deep Neural Rejection (DNR)を提案。複数の層で特徴量を分析して棄却を行う。
類似の棄却手法は、
などがあるが、提案手法は学習時にAEsを生成する必要がないため計算量が少ない。
有用な表現(評価方法の重要性)
It is worth remarking here that correctly evaluating a defense mechanism is a crucial point when proposing novel defenses against adversarial examples [2], [17]. The majority of previous work proposing defense methods against adversarial examples has only evaluated such defenses against previous attacks rather than against an ad-hoc attack crafted specifically against the proposed defense (see, e.g., [15], [18], [19] and all the other re-evaluated defenses in [17], [20]). The problem with these black-box and gray-box evaluations in which the attack is essentially unaware of the defense mechanism is that they are overly optimistic. It has indeed been shown afterwards that such defenses can be easily bypassed by simple modifications to the attack algorithm [17], [20], [21]. For instance, many defenses have been found to perform gradient obfuscation, i.e., they learn functions which are harder to optimize for gradient-based attacks; however, they can be easily bypassed by constructing a smoother, differentiable approximation of their function, e.g., via learning a surrogate model [2], [6], [22]–[25] or replacing network layers which obfuscate gradients with smoother mappings [17], [20], [21]. In our case, an attack that is unaware of the defense mechanism may tend to craft adversarial examples in areas of the input space which are assigned to the rejection class; thus, such attacks, as well as previously-proposed ones, may rarely bypass our defense. For this reason, we believe that our adaptive white-box attack, along with the security evaluation methodology adopted in this work, provide another significant contribution to the state of the art related to the problem of properly evaluating defenses against adversarial examples.
下記の論文で述べられている評価方法(摂動幅を増やして精度を評価)を採用している。
The corresponding security evaluation curve [2] shows how gracefully the performance decreases while the attack increases in strength, up to the point where the defense reaches zero accuracy.
防御が崩壊する点が既存手法よりも大きな摂動であることを示すことも重要。(従来手法よりも頑健。)
Another relevant point is to show that such a breakdown point occurs at a larger perturbation than that exhibited by competing defenses, to show that the proposed defense is more robust than previously-proposed ones.
MNISTとCIFAR10で評価実験をしたところ、
提案手法の Deep Neural Rejection は classifier の層をいくつか選ぶ。それぞれの層からの出力に対して、RBF SVMを学習。また、それらの出力を統合した結果を更にRBF SVMに学習させる。推定されたクラスのlogitが閾値以下なら棄却。
MNISTとCIFAR10で評価。学習データとテストデータの分割は、
We average the results on five different runs. In each run, we consider 10,000 training samples and 1,000 test samples, randomly drawn from the corresponding datasets. The deep neural networks (DNNs) used in our experiments are pre-trained on a separate split of 30,000 and 40,000 training samples, respectively for MNIST and CIFAR10.
l2ノルムのwhite box attackに対する精度。
提案手法 (DNR) は既存手法 (NR) よりも高精度。また、DNRを騙すためにはより構造化されされた摂動が必要になることが分かった。
関連研究 学習データの分布外のデータを棄却(AEsのrejectionの評価はしていない)
単一の層の特徴量しか棄却に使用しない
複数の層の特徴量を棄却に使用するが、学習時にAEsの生成を必要とする
論文リンク
https://arxiv.org/abs/1910.00470
公開日(yyyy/mm/dd)
2019/10/01
概要
AEsを検出する手法を提案。異なるネットワークの層で特異な特徴表現をするサンプルを棄却する。この検出手法に対する新しい攻撃手法を提案し評価した。提案手法は既存手法よりも高精度だった。
メモ
TeX