Tiny ImageNet Test Set Size Inconsistencies in benchmark_imglist

Jingkang50 / OpenOOD

Benchmarking Generalized Out-of-Distribution Detection

MIT License

875 stars 113 forks source link

Tiny ImageNet Test Set Size Inconsistencies in benchmark_imglist #259

Closed dlotfi closed 2 months ago

dlotfi commented 2 months ago

I'm noticing a discrepancy in the size of the Tiny ImageNet test set used as an OoD dataset across different datasets in the latest benchmark_imglist files:

For CIFAR-10, the size is 7,793
For CIFAR-100, the size is 6,526
For COVID and MNIST, the size is 10,000

Could you clarify why these sizes differ?

zjysteven commented 2 months ago

A quick and short answer is that we did filtering over Tiny ImageNet w.r.t. each ID dataset to avoid ambiguous OOD images which might actually be ID. This is why you see different number of images. For example for MNIST there won't be any ambiguity so all 10,000 images from Tiny ImageNet are used (no samples are removed). Does this make sense?

dlotfi commented 2 months ago

Yes, that makes sense. I actually remember reading about this in your paper but forgot, sorry for that. Given that the number of samples used varies depending on the ID dataset, do you think this could impact the comparability of the metrics on Tiny ImageNet across different settings?

zjysteven commented 2 months ago

To a certain extent yes. But is there any particular case where you need to compare the metrics between say (CIFAR-10 v.s. Tiny ImageNet) and (CIFAR-100 v.s. Tiny ImageNet)?

dlotfi commented 2 months ago

You're right. I don’t have a specific case in mind. Thanks for your answer.