Saehyung-Lee / OAT

This repository is the official implementation of "Removing Undesirable Feature Contributions using Out-of-Distribution Data", published as a conference paper at ICLR 2021.
https://arxiv.org/abs/2101.06639
8 stars 2 forks source link

Seeking for clarification on a few details #1

Closed zjysteven closed 3 years ago

zjysteven commented 3 years ago

Hi,

Thanks for releasing the code. I'm particularly interested in this work since I have been doing exactly the same thing recently (incorporating OOD samples to perform adversarial training). And I have a few questions about some details of your experiments.

1) Regarding the CIFAR experiments, are you using exactly the same 500K Tiny Images data from Carmon's et al.?

The reason I'm asking this is that, according to my experiments, when using Carmon's 500K Tiny Images as the auxiliary data for CIFAR10, using the uniform distribution instead of pseudo-labels as the targets still provides performance improvements than without such additional data. What I was doing here seems to correspond to your CIFAR10 experiments and align well with those CIFAR10 results in Table1 of your paper, right? However, the 500K data from Carmon are actually carefully selected to be near or in-distribution to CIFAR images, so I think the results of this experiment do not really support the claim that OOD data can actually help with robustness. In fact, it reveals something interesting: When including additional in-distribution data, even not very accurate targets can help improve robustness.

So to see whether OOD data can really help, I also tried to use Tiny ImageNet as the auxiliary data for adversarial training (compared with those carefully selected 500K Tiny Images, Tiny ImageNet should be obviously more OOD for CIFAR). The results on ResNet18 were shown below (Adv-OOD shares exactly the same form as OAT's training objective).

Method Clean PGD AA
TRADES 81.3 51.8 48.8
RST (500K Tiny Images) 83.7 55.7 51.7
Adv-OOD (100K Tiny ImageNet) 82.4 54.6 47.2

Interestingly, although incorporating OOD data (Tiny ImageNet) helps with PGD accuracy, it actually yields worse robustness when evaluating against the stronger AA. I haven't got a concrete explanation for this, but it seems that in this case there was some gradient masking.

2) Regarding the ImageNet10 experiments, are you picking the best performing checkpoint or just using the last checkpoint to report the results?

The reason I'm asking this is that the ImageNet10 results seem to contradict the results that I was observing in the above table (since ImageNet990 is indeed OOD for ImageNet10, and according to my results I would expect that this would not really provide significant improvement). However, as shown by [1], adversarial training suffers from overfitting, and thus the test robustness at the end of the training can be much lower than the best case (which typically occurs right after the first learning rate decay and can be achieved by simple early stopping). Thus, I'm wondering if the results for adversarial training are the best case.

Thanks in advance and look forward to having some discussions on this!

[1] Rice, Leslie, Eric Wong, and Zico Kolter. "Overfitting in adversarially robust deep learning." International Conference on Machine Learning. PMLR, 2020.

zjysteven commented 3 years ago

Some updates to make my questions/thoughts more clear.

According to the filename of your OOD data ("ti_1M_lowconfidence_unlabeled.pickle"), it seems that you are not using directly Carmon's data, which partly answers my first question. However, I'm still curious about whether any OOD data can help with the robustness.

Since CIFAR10 (ID) are a subset of Tiny Images (OOD), and ImageNet10 (ID) and ImageNet990 (OOD) come from the same ImageNet dataset, what I would interpret from Table1 is that OOD data which are pretty "close" to the ID data indeed can help with robustness.

Then the results in Table2 touches the case where the OOD data is "far" away from the ID data. However, it is not clear whether in this case OAT still improves the overall robustness since Table2 doesn't show the results against AutoAttack.

Saehyung-Lee commented 3 years ago

Hi, thank you for your interest in our work!

  1. Regarding the experiments in Figure 1 of our paper, we used the same 500K Tiny Images data from Carmon's et al. (note 80M-TI in Table 1 of our paper denotes a OOD-filtered-in dataset and thus is disjoint from Carmon's 500K dataset), and the robustness improvement due to the application of our method with 500K Tiny Images is still consistent with our findings . OAT enables us to suppress the contributions of features present in the images used. As you said, 500K data are carefully selected to be near or in-distribution to CIFAR, so they are expected to share multiple robust and non-robust features with CIFAR-10. Therefore, even though OAT cannot improve adversarial robustness as much RST (Carmon et al., 2019) does, it still can improve robustness by suppressing the contribution of "non-robust" features present in 500K Tiny Images. Of course, it also may suppress the contribution of "robust" features reside in 500K Tiny Images, but the symmetry in the effect on non-robust and robust features is broken by the following reason: Based on the previous works on non-robust features [1,2], the adversarial vulnerability of standard classifiers indicates that CNNs are biased toward non-robust features. This bias makes the difference between the effect of OAT on robust features and that on non-robust features. In other words, the greater influence of non-robust features compared to that of robust features, greater effect on non-robust features than that on robust features in OAT.
  2. Here, we provide the results for the OAT models (using SVHN, Simpson and Fashion) against AA on CIFAR-10. In these results, we can see that the "far" OOD datasets still improve robustness against the strong adversarial attack, though the improvements are small. OOD None SVHN Simpson Fashion
    Clean 87.48 86.16 86.79 85.84
    AA 48.29 49.25 49.24 48.76
  3. In all experiments, we recorded the maximum adversarial robustness of the models on the test set after the first learning rate decay.

[1] Tsipras, Dimitris, et al. "Robustness May Be at Odds with Accuracy." International Conference on Learning Representations. 2018. [2] Ilyas, Andrew, et al. "Adversarial examples are not bugs, they are features." Advances in Neural Information Processing Systems. 2019.

zjysteven commented 3 years ago

@Saehyung-Lee Thanks for the response!

  1. Overall, I tend to agree with your point on the intuition of OAT's effect (on non-robust/robust features), and actually that's also why I got the same idea previously. However, the only thing that concerned me is the table results of mine shown in my first comment. I will talk about it more in the next point. Meanwhile, regarding Figure 1, I'm actually a little bit confused. I can't see what exactly the training objective of OAT+RST is. OAT should maximize the output entropy of the 500K Tiny Image data, while RST minimizes the entropy by providing the pseudo-label. Could you elaborate more on this?

  2. Thanks for these results against AA. I'm just still curious why in my experiment I observed a gradient masking-like result while you did not. Would you mind sharing your thoughts on this? From what I can tell, there are a few differences between the setting of your Table1 and my table.

  1. In addition, actually when I saw the gradient-masking results in my table, I was suspecting the reason is due to label smoothing since the uniform distribution target for OOD data is a form of excessive/extreme label smoothing. Recently there are two works [2,3] showing that a moderately high degree of label smoothing (which is still less extreme than uniform distribution) can already cause gradient masking.

[1] Gowal, Sven, et al. "Uncovering the Limits of Adversarial Training against Norm-Bounded Adversarial Examples." arXiv preprint arXiv:2010.03593 (2020). [2] Pang, Tianyu, et al. "Bag of tricks for adversarial training." arXiv preprint arXiv:2010.00467 (2020). [3] Jiang, Linxi, et al. "Imbalanced gradients: A new cause of overestimated adversarial robustness." arXiv preprint arXiv:2006.13726 (2020).

Saehyung-Lee commented 3 years ago
  1. The OAT+RST model was trained using the 500K Tiny Image data and 80M-TI (OOD) simultaneously. Please refer to Appendix H for more details.
  2. In your experimental settings, the use of 500K Tiny Images seem to yield a limited robustness improvement compared to the improvements shown in the original RST work (Carmon et al., 2019). In light of this, some settings in your experiments appear to be inhibiting the full-utilization of the additional dataset. Or the Tiny Imagenet dataset may not be compatible with CIFAR-10 within our framework.
  3. In our results, we could also observe the gap between the robustness of OAT model under targeted attacks and untargeted attacks. However, we don't think this gap is attributed to gradient masking, because OAT consistently improved robustness against all attacks tested, though they were uneven. Instead of gradient masking, we hypothesized that the distributions of non-robust features for targeted and untargeted attacks are different and the gradient masking-like results are attributed to the separation. Please refer to Appendix G for more details.
zjysteven commented 3 years ago

OK, thanks for the clarification!