Closed syzymon closed 3 years ago
The process of dataset generation was automated. Do you know if original imagenet also contains duplicates? If not then something indeed is wrong.
Please be aware that the dataset generated by us is hosted here: http://image-net.org/download-images
Your link refers to another dataset. Maybe this is the source of the confusion.
I have downloaded images from here: http://image-net.org/download-images and indeed validation set contains only 29 duplicates - that's much better. The dataset with a lot of duplicates (http://image-net.org/small/valid_32x32.tar) comes from this paper: https://arxiv.org/pdf/1601.06759.pdf but I don't actually know where they took the data from, still seems to be the case that raw .png data has already duplicates in it - sorry for the confusion.
Thanks
Hi,
are you aware that the validation set of 49999 images downloaded from here:
http://image-net.org/small/valid_32x32.tar
has a lot of duplicate images? Reproduction of a few examples: 02273.png and 42263.png 04990.png and 45295.png
overall, there are only ~45047 unique images in the validation set - about 5k of them occur twice, and a few even three times. Is that intended to give some examples more weight for validation score, or rather a bug? + wondering if it also applies to 64x64 version - haven't tested that yet
Thanks