Closed dlotfi closed 2 months ago
A quick and short answer is that we did filtering over Tiny ImageNet w.r.t. each ID dataset to avoid ambiguous OOD images which might actually be ID. This is why you see different number of images. For example for MNIST there won't be any ambiguity so all 10,000 images from Tiny ImageNet are used (no samples are removed). Does this make sense?
Yes, that makes sense. I actually remember reading about this in your paper but forgot, sorry for that. Given that the number of samples used varies depending on the ID dataset, do you think this could impact the comparability of the metrics on Tiny ImageNet across different settings?
To a certain extent yes. But is there any particular case where you need to compare the metrics between say (CIFAR-10 v.s. Tiny ImageNet) and (CIFAR-100 v.s. Tiny ImageNet)?
You're right. I don’t have a specific case in mind. Thanks for your answer.
I'm noticing a discrepancy in the size of the Tiny ImageNet test set used as an OoD dataset across different datasets in the latest benchmark_imglist files:
Could you clarify why these sizes differ?