Closed kai422 closed 1 year ago
Hi, Kai
Nice question! I will analyze this phenomenon, and let you know soon!
Best, Yibing
On Fri, 28 Jul 2023 at 10:33 AM, Kai Xu @.***> wrote:
Hi, Nice work! I use ImageNet validation data as ID data, but accidentally use part of ImageNet training data (n06359193/ n06596364/ n06785654/ n06794110/ n06874185/) as OOD data.
python testood.py \ --seed -1\ --name test${METHOD}_sweep \ --in_datadir /mnt/data/kai422/imagenet/val \ --out_dataroot /home/kai/ood_coverage/dataset/ood_data \ --batch $BATCH \ --layer-name $LAYER\ --model $MODEL\ --logdir checkpoints/$MODEL/$LAYER \ --score ${METHOD} \ --out_datasets imagenet_train_n068 The results of OOD detection on ood_coverage are: Method imagenet_train_n068 AUCROC 99.94% AUCROC (In) 99.99% AUPR (Out) 99.60% FPR95 0.09%
But I think the model should unable to distinguish the imagenet_train and imagenet_val because they are in the same distribution?
In contrast, the results of GradNorm are: Method imagenet_train_n068 AUCROC 57.14% AUCROC (In) 90.25% AUPR (Out) 15.32% FPR95 90.71%
Regards, Kai
— Reply to this email directly, view it on GitHub https://github.com/BierOne/ood_coverage/issues/1, or unsubscribe https://github.com/notifications/unsubscribe-auth/AGGRUDEEXH2G7YIV4W6OTODXSMQI3ANCNFSM6AAAAAA224NABU . You are receiving this because you are subscribed to this thread.Message ID: @.***>
Thanks!
Hi Kai,
Sorry for this late response. We carefully analyzed the method and found that the NAC function is overfitted on the InD ImageNet validation set. So any data points other than validation data would be distinguished as OOD in this situation. While this operates like the one-class classification, it is not acceptable for OOD detection.
To re-evaluate our method, we tried modeling the NAC function with the smaller $M$ on the training set, which achieves comparable performance to the recent SoTA ASH. However, we still noticed several problematic settings in the current ImageNet benchmark, e.g., using InD test data to tune hyperparameters or select models. This could still limit the validity of the results.
As such, we will further conduct experiments on the recent OpenOOD v1.5 and update our arxiv paper ASAP. Thanks for your interest in this work.
Best, Yibing
Hi Yibing,
Thanks for your explaination, looking forward to the update :)
Regards, Kai
Hi Kai,
I hope this message finds you well. We have recently updated both the arxiv paper and the repository. Without modifying the method, our NAC-UE still outperforms 21 SoTA OOD detection methods across three benchmarks (CIFAR-10, CIFAR-100, ImageNet-1k)!
You can see our arxiv paper for more information. Thank you for your continued interest!
Best, Yibing
Hello Yibing, I want to know if the overfitting issue on the test set has been resolved. Thanks.
Regards, Jiankang
Hi Jiankang@Cverchen,
Thanks for your interest in this work! The answer is YES! We have resolved the overfitting issues in our new version of NAC-UE. Specifically, in this new version,
Note that in this new version, the core method of our NAC-UE remains unchanged! Compared to the first version, we just explored using multiple layers of neurons for OOD detection, which achieves improved performance. You can find more details in our arXiv paper.
Once again, we appreciate your continued interest! Kindly let me know if you have any questions.
Best, Yibing
Thanks!
Additionally, we would also like to mention that the overfitting issue on the test set is a common problem in current OOD detection methods. This is primarily because many approaches rely on tuning hyperparameters or selecting models based on the InD test set, e.g., [1-4].
While our previous version was also affected by this issue, it is crucial to emphasize that this does not imply any form of experimental misconduct. Rather, we followed the evaluation protocol that is prevalent in OOD benchmarks, though in hindsight this is not a suitable manner [5]. We sincerely hope that this clarification helps address any concerns regarding our previous paper.
[1] S. Liang, et al. Enhancing The Reliability of Out-of-distribution Image Detection in Neural Networks. ICLR, 2018. [2] S. Kong, et al. Opengan: Open-set recognition via open data generation. ICCV, 2021. [3] K. Lee, et al. A simple unified framework for detecting out-of-distribution samples and adversarial attacks. NeurIPS, 2018. [4] Y.-C. Hsu, et al. Generalized odin: Detecting out-of-distribution image without learning from out-of-distribution data. CVPR, 2020. [5] Jingyang J., et al. OpenOOD v1.5: Enhanced Benchmark for Out-of-Distribution Detection. arXiv, 2023.
Glad to hear that! I just posted another respons :)
Thanks!
Hi, Nice work! I use ImageNet validation data as ID data, but accidentally use part of ImageNet training data (
n06359193/ n06596364/ n06785654/ n06794110/ n06874185/
) as OOD data.python test_ood.py \ --seed -1\ --name test_${METHOD}_sweep \ --in_datadir /mnt/data/kai422/imagenet/val \ --out_dataroot /home/kai/ood_coverage/dataset/ood_data \ --batch $BATCH \ --layer-name $LAYER\ --model $MODEL\ --logdir checkpoints/$MODEL/$LAYER \ --score ${METHOD} \ --out_datasets imagenet_train_n068
The results of OOD detection on ood_coverage are:But I think the model should unable to distinguish the imagenet_train and imagenet_val because they are in the same distribution?
In contrast, the results of GradNorm are:
Regards, Kai