Overfitting issue of NAC on ImageNet benchmark

kai422 commented 1 year ago

Hi, Nice work! I use ImageNet validation data as ID data, but accidentally use part of ImageNet training data (n06359193/ n06596364/ n06785654/ n06794110/ n06874185/) as OOD data.

python test_ood.py \ --seed -1\ --name test_${METHOD}_sweep \ --in_datadir /mnt/data/kai422/imagenet/val \ --out_dataroot /home/kai/ood_coverage/dataset/ood_data \ --batch $BATCH \ --layer-name $LAYER\ --model $MODEL\ --logdir checkpoints/$MODEL/$LAYER \ --score ${METHOD} \ --out_datasets imagenet_train_n068 The results of OOD detection on ood_coverage are:

Method	imagenet_train_n068
AUCROC	99.94%
AUCROC (In)	99.99%
AUPR (Out)	99.60%
FPR95	0.09%

But I think the model should unable to distinguish the imagenet_train and imagenet_val because they are in the same distribution?

In contrast, the results of GradNorm are:

Method	imagenet_train_n068
AUCROC	57.14%
AUCROC (In)	90.25%
AUPR (Out)	15.32%
FPR95	90.71%

Regards, Kai

BierOne commented 1 year ago

Hi, Kai

Nice question! I will analyze this phenomenon, and let you know soon!

Best, Yibing

On Fri, 28 Jul 2023 at 10:33 AM, Kai Xu @.***> wrote:

Hi, Nice work! I use ImageNet validation data as ID data, but accidentally use part of ImageNet training data (n06359193/ n06596364/ n06785654/ n06794110/ n06874185/) as OOD data.

python testood.py \ --seed -1\ --name test${METHOD}_sweep \ --in_datadir /mnt/data/kai422/imagenet/val \ --out_dataroot /home/kai/ood_coverage/dataset/ood_data \ --batch $BATCH \ --layer-name $LAYER\ --model $MODEL\ --logdir checkpoints/$MODEL/$LAYER \ --score ${METHOD} \ --out_datasets imagenet_train_n068 The results of OOD detection on ood_coverage are: Method imagenet_train_n068 AUCROC 99.94% AUCROC (In) 99.99% AUPR (Out) 99.60% FPR95 0.09%

But I think the model should unable to distinguish the imagenet_train and imagenet_val because they are in the same distribution?

In contrast, the results of GradNorm are: Method imagenet_train_n068 AUCROC 57.14% AUCROC (In) 90.25% AUPR (Out) 15.32% FPR95 90.71%

Regards, Kai

— Reply to this email directly, view it on GitHub https://github.com/BierOne/ood_coverage/issues/1, or unsubscribe https://github.com/notifications/unsubscribe-auth/AGGRUDEEXH2G7YIV4W6OTODXSMQI3ANCNFSM6AAAAAA224NABU . You are receiving this because you are subscribed to this thread.Message ID: @.***>

kai422 commented 1 year ago

Thanks!

BierOne commented 1 year ago

Hi Kai,

Sorry for this late response. We carefully analyzed the method and found that the NAC function is overfitted on the InD ImageNet validation set. So any data points other than validation data would be distinguished as OOD in this situation. While this operates like the one-class classification, it is not acceptable for OOD detection.

To re-evaluate our method, we tried modeling the NAC function with the smaller $M$ on the training set, which achieves comparable performance to the recent SoTA ASH. However, we still noticed several problematic settings in the current ImageNet benchmark, e.g., using InD test data to tune hyperparameters or select models. This could still limit the validity of the results.

As such, we will further conduct experiments on the recent OpenOOD v1.5 and update our arxiv paper ASAP. Thanks for your interest in this work.

Best, Yibing

kai422 commented 1 year ago

Hi Yibing,

Thanks for your explaination, looking forward to the update :)

Regards, Kai

BierOne commented 1 year ago

Hi Kai,

I hope this message finds you well. We have recently updated both the arxiv paper and the repository. Without modifying the method, our NAC-UE still outperforms 21 SoTA OOD detection methods across three benchmarks (CIFAR-10, CIFAR-100, ImageNet-1k)!

You can see our arxiv paper for more information. Thank you for your continued interest!

Best, Yibing

Cverchen commented 1 year ago

Hello Yibing, I want to know if the overfitting issue on the test set has been resolved. Thanks.

Regards, Jiankang

BierOne commented 1 year ago

Hi Jiankang@Cverchen,

Thanks for your interest in this work! The answer is YES! We have resolved the overfitting issues in our new version of NAC-UE. Specifically, in this new version,

For all of the experiments, we utilize the InD training data (e.g., 1000 training images) to build the NAC function.
Following the evaluation protocol of OpenOOD, we employ the InD and OOD validation set to select the best hyperparameters. This helps us avoid overfitting to the InD test data.
By employing a subset of training data for sanity check (as described by Kai in here), the FPR95 of our NAC-UE is about 95% over the ImageNet benchmark (higher could be better). This is comparable to the existing baseline approaches, such as ASH (95%).

Note that in this new version, the core method of our NAC-UE remains unchanged! Compared to the first version, we just explored using multiple layers of neurons for OOD detection, which achieves improved performance. You can find more details in our arXiv paper.

Once again, we appreciate your continued interest! Kindly let me know if you have any questions.

Best, Yibing

Cverchen commented 1 year ago

Thanks!

BierOne commented 1 year ago

Additionally, we would also like to mention that the overfitting issue on the test set is a common problem in current OOD detection methods. This is primarily because many approaches rely on tuning hyperparameters or selecting models based on the InD test set, e.g., [1-4].

While our previous version was also affected by this issue, it is crucial to emphasize that this does not imply any form of experimental misconduct. Rather, we followed the evaluation protocol that is prevalent in OOD benchmarks, though in hindsight this is not a suitable manner [5]. We sincerely hope that this clarification helps address any concerns regarding our previous paper.

[1] S. Liang, et al. Enhancing The Reliability of Out-of-distribution Image Detection in Neural Networks. ICLR, 2018. [2] S. Kong, et al. Opengan: Open-set recognition via open data generation. ICCV, 2021. [3] K. Lee, et al. A simple unified framework for detecting out-of-distribution samples and adversarial attacks. NeurIPS, 2018. [4] Y.-C. Hsu, et al. Generalized odin: Detecting out-of-distribution image without learning from out-of-distribution data. CVPR, 2020. [5] Jingyang J., et al. OpenOOD v1.5: Enhanced Benchmark for Out-of-Distribution Detection. arXiv, 2023.

BierOne commented 1 year ago

Glad to hear that! I just posted another respons :)

Thanks!

BierOne / ood_coverage

Overfitting issue of NAC on ImageNet benchmark #1