bingykang / Fewshot_Detection

Few-shot Object Detection via Feature Reweighting
https://arxiv.org/abs/1812.01866
529 stars 112 forks source link

Inconsistencies in COCO splits #84

Open andrearosasco opened 2 years ago

andrearosasco commented 2 years ago

Hello, I'm looking at the content of 5k.json and trainvalno5k.json and I found the following issues:

  1. Given the original trainval2014, the number of missing images is 6023: 5321 originally from val2014 and 702 from train2014. All of the pictures in 5k.json are from val2014 so 702 images from train2014 and 321 from val2014 went missing.
  2. In 5k.json we have 35511 bounding boxes that reference 4954 images, so I think 46 test images are never used. Also, where are the annotations to those 46 images? If they were still in trainvalno5k.json they would cause errors so I guess they have been deleted?
  3. If we look at the bounding boxes removed from the original trainval2014 to generate trainvalno5k.json we see that they match the instances added to 5k.json. Does it mean that there are bounding boxes referring to the 6023 - 5000 missing images that have been deleted from trainvalno5k.json?

I know these issues are unlikely to cause any significant problem, but still, I'd like to see the rationale behind this. Is the script that made those splits available anywhere?

andrearosasco commented 2 years ago

Ok, I think it is caused by the fact that there are coco images with no annotations. Have to verify that.

Edit: alright. there are 702 images without annotations in train2014 and 367 in val2014: 46 of these images ended up in 5k.json while the others got removed. All make sense. Sorry, I don't have much experience with COCO