Code for making dic_coco.npy and the prior stat_prob.npy

LinkToPast1990 commented 4 years ago

Could you also share the code for making dic_coco.npy and the prior stat_prob.npy? Thanks

And in order to construct dic_coco.npy with ground-truth bboxes, I should modify the modeling/detector/generalized_rcnn.py in maskrcnn-benchmark as following, right?

    # we directly use bounding box coordinates from ground truth label
    if self.training:
        proposals = [target for target in targets]
    else:
        devices = features[0].get_device()
        proposals = [target.to(devices) for target in targets]

Wangt-CN commented 4 years ago

Hi, thanks for the interests to our work :)

dic_coco.npy: As we wrote in our paper, the dic_coco.npy actually is the pre-calculated confounder dictionary, which contains the averaged RoI feature of each category in MSCOCO (80 class). We use the pretrained faster rcnn model to generate it. Therefore the code is just the Faster R-CNN which can be found in maskrcnn-benchmark. Or you can just use your familiar codebase (e.g. mmdetection). After extracting the RoI feature of each image of MSCOCO, we just do the RoI feature vector average to get the dic_coco.npy.
‘stat_prob.npy’. The stat_prob.npy is just calculated with the appearence frequency of each object category in the MSCOCO dataset. That means, it can be calculated by just using the annotations of MSCOCO train 2014. For convenience, I use the cocoapi: And the key code is:
```
def p_z(z):
## z is the object label
catIds_ann = coco.getCatIds(catNms=[z])
annIds = coco.getAnnIds(catIds=catIds_ann, iscrowd=None)
## how many annotations of z in dataset
length = len(annIds)  
## 604907 is the number of annotations in train2014
return length/604907
```

Wangt-CN commented 4 years ago

Yes, you are right! BTW, actually I have tried to extract pretrained feature for constructing z by both using ground-truth bounding box and just following the original pretrained faster r-cnn. And I found the difference of them and the performance in downstream tasks can be little, the probable reason is that they are averaged on the whole MSCOCO and the detection system nowadays can be much reliable. But I still think using the gt bounding box maybe little better.

LinkToPast1990 commented 4 years ago

Thanks! And besides z, the paper also uses the gt boxes to train the VC R-CNN while using the predicted box model at testing. It may lead to a train/test shift?

Wangt-CN commented 4 years ago

Hi, actually the VC R-CNN don't have the testing procedure (It's the feature extraction procedure). Moreover, the VC R-CNN training procedure is used to learn an image feature embeding. Then when extrating feature, our VC R-CNN can be regarded as a feature extractor and any bounding box coordinates can be ok.

Wangt-CN / VC-R-CNN

Code for making dic_coco.npy and the prior stat_prob.npy #3