Open andrearosasco opened 2 years ago
Ok, I think it is caused by the fact that there are coco images with no annotations. Have to verify that.
Edit: alright. there are 702 images without annotations in train2014
and 367 in val2014
: 46 of these images ended up in 5k.json
while the others got removed. All make sense. Sorry, I don't have much experience with COCO
Hello, I'm looking at the content of
5k.json
andtrainvalno5k.json
and I found the following issues:trainval2014
, the number of missing images is 6023: 5321 originally fromval2014
and 702 fromtrain2014
. All of the pictures in5k.json
are fromval2014
so 702 images fromtrain2014
and 321 fromval2014
went missing.5k.json
we have 35511 bounding boxes that reference 4954 images, so I think 46 test images are never used. Also, where are the annotations to those 46 images? If they were still intrainvalno5k.json
they would cause errors so I guess they have been deleted?trainval2014
to generatetrainvalno5k.json
we see that they match the instances added to5k.json
. Does it mean that there are bounding boxes referring to the 6023 - 5000 missing images that have been deleted fromtrainvalno5k.json
?I know these issues are unlikely to cause any significant problem, but still, I'd like to see the rationale behind this. Is the script that made those splits available anywhere?