Inconsistencies in COCO splits

Hello, I'm looking at the content of 5k.json and trainvalno5k.json and I found the following issues:

Given the original trainval2014, the number of missing images is 6023: 5321 originally from val2014 and 702 from train2014. All of the pictures in 5k.json are from val2014 so 702 images from train2014 and 321 from val2014 went missing.
In 5k.json we have 35511 bounding boxes that reference 4954 images, so I think 46 test images are never used. Also, where are the annotations to those 46 images? If they were still in trainvalno5k.json they would cause errors so I guess they have been deleted?
If we look at the bounding boxes removed from the original trainval2014 to generate trainvalno5k.json we see that they match the instances added to 5k.json. Does it mean that there are bounding boxes referring to the 6023 - 5000 missing images that have been deleted from trainvalno5k.json?

I know these issues are unlikely to cause any significant problem, but still, I'd like to see the rationale behind this. Is the script that made those splits available anywhere?

bingykang / Fewshot_Detection

Inconsistencies in COCO splits #84