Jingkang50 / OpenOOD

Benchmarking Generalized Out-of-Distribution Detection
MIT License
858 stars 108 forks source link

I can't find the Far ood Validation dataset #166

Closed ma-kjh closed 1 year ago

ma-kjh commented 1 year ago

First of all, Thank you for your benchmark dataset research.

I have a question, I can't find the Far-ood validation dataset.

As the paper mentioned, we should use validation of ID and OOD evaluation protocol.

but I can't find Far-ood validation datasets such as svhn, mnist, textures, places365.

just can use the ID validation dataset and test ood dataset ?

zjysteven commented 1 year ago

Hi @ma-kjh, by default the val OOD samples in each setting are coming from only one dataset rather than multiple datasets. For example in the CIFAR case the val OOD is from Tiny ImageNet. We feel like this is more realistic than collecting val OOD data from each OOD type (e.g., svhn, textures, etc.), as often times in the real world you may not know beforehand what OOD types are there in the environment until you really deploy the model.

ma-kjh commented 1 year ago

Thank you ! i understand.

ma-kjh commented 1 year ago

Hi. I have a question.

where is the Tiny ImageNet validation OOD dataset ?

image

when i get into benchmark_imglist/cifar10 directory, i can only find val_cifar100.txt and val_cifar10.txt..

zjysteven commented 1 year ago

How and where did you download the benchmark_imglist.zip from (it misses many files)? I checked on my end and it is correct. Can you try gdown 1XKzBdWCqg3vPoj-D32YixJyJJ0hL63gP, which is also encoded here? https://github.com/Jingkang50/OpenOOD/blob/8d44375e4c695d03d2b97850b754f24fd4bda447/scripts/download/download.py#L73

ma-kjh commented 1 year ago

image

I downloaded the dataset from your cloud. (https://entuedu-my.sharepoint.com/personal/jingkang001_e_ntu_edu_sg/_layouts/15/onedrive.aspx?id=%2Fpersonal%2Fjingkang001%5Fe%5Fntu%5Fedu%5Fsg%2FDocuments%2Fopenood&ga=1)

I will try using download.py! thanks

zjysteven commented 1 year ago

Yeah sorry for the confusion. Those were out-dated (used in v1.0). For all data and pre-trained models, please refer to the v1.5 (up-to-date) section in README.

ma-kjh commented 1 year ago

Thank you !

ma-kjh commented 1 year ago

Hi @zjysteven

In Appendix A, " with 1,305 images removed due to semantic overlap " , places365 removed 1,305 images.

I have a question , what is the semantic overlap, and how can you measure semantic overlap.

I cant find the content in the paper.

In Tiny ImageNet cases, you divided into semantic overlap classes such as bullfrog - frog, Labrador retriever - dog.

but I don't understand how can measure semantic overlap in places365.

Thank you !

Screenshot 2023-07-06 at 13 40 31

zjysteven commented 1 year ago

@Jingkang50 can answer this part better than me because it is directly inherited from OpenOOD v1 (where Jingkang took the lead).

My best guess is that both filtering process should be similar, like you first monitor the class name and then do some manual, visual inspection if necessary. I don't think there's anything special about filtering places365.

ma-kjh commented 1 year ago

I also guess is both filtering process should be similar, but I checked the class name in test_places365.txt in cifar100 benchmark, there are still 365 classes, and also contains such as 'bridge'.

that's why i ask the question.

Thank you !

zjysteven commented 1 year ago

I see your point. I have to say that making OOD datasets completely "ID-free" is very difficult, and you can't guarantee anything until you manually inspect images one by one. Such difficulty is best illustrated in a recent paper called NINCO, although as discussed in our appendix, in practice the effect of the noisy samples in OOD datasets may not be that significant.

ma-kjh commented 1 year ago

Thanks for helping me understand.

@Jingkang50

Jingkang50 commented 1 year ago

Thank you @zjysteven and @ma-kjh for the insightful discussion. The Places365 filtering work was done in the work of Semantically Coherent Out-of-Distribution Detection https://arxiv.org/abs/2108.11941, where we used a very strong CIFAR-100 model (BiT if I remember it correctly) at that time and calculate and sort the ID-ness of each Places365 classes. We then manually inspect top ID-ness classes and remove the entire classes if really ID. Admittedly, it seems that there are some ID samples not removed, they might affect not that much per NINCO work. Nevertheless, I would still discuss with the OpenOOD team whether to have something like Places365-purified in the future, but again, even if it is purified, it might still be noisy. So now I might prefer to solve the accurate-evaluation problem by evaluating on more OOD datasets, or more challenging full-spectral OOD benchmarks.

ma-kjh commented 1 year ago

Thank you for the detailed description!