YoungXIAO13 / FewShotDetection

(ECCV 2020) PyTorch implementation of paper "Few-Shot Object Detection and Viewpoint Estimation for Objects in the Wild"
http://imagine.enpc.fr/~xiaoy/FSDetView/
MIT License
211 stars 33 forks source link

How do you ensure fair data usage? #18

Closed ZhangGongjie closed 3 years ago

ZhangGongjie commented 3 years ago

You created 2 datasets. One is meta dataset (support dataset). The other one is training dataset. Both datasets have been filtered to contain only K number of training samples for novel categories. (K is from K-shot)

However, I don't find the codes to ensure that the samples from meta dataset and training dataset are exactly the same. If both datasets are mutually exclusive, then the total number of training samples used shall be 2x the correct number.

Please kindly correct me if I am missing something here. Thank you very much! @YoungXIAO13

YoungXIAO13 commented 3 years ago

Hi @ZhangGongjie ,

Actually, the training dataset in the few-shot fine-tuning stage uses exactly the same training samples as meta dataset. For example, the VOC data loader in the second stage initialises from voc_2007_shots, which is generated in the meta dataset.

Using this configuration, the meta dataset and the training dataset share the same samples, thus K for K-shot.

ZhangGongjie commented 3 years ago

Thank you so much for your clarification!

One more issue, could you please clarify whether K-shot means K bounding box(es) for each novel category, or K image(s) for each category (in this case, object instances may be larger than K)?

YoungXIAO13 commented 3 years ago

Sure, the number of samples represents the number of objects (or the bounding boxes). In the meta dataset, the number of K is increased each time an instance is added.

ZhangGongjie commented 3 years ago

an image can contain multiple bbox of multiple categories. How do you ensure that the number of each category is exactly K? Simply by discarding some of the annotations?

YoungXIAO13 commented 3 years ago

This sampling strategy is initially proposed in Meta R-CNN, where the annotations beyond K are simply discarded. This is also used in a more recent paper TFA

Personally I think that this is not the perfect sampling strategy for few-shot object detection since it would potentially create some false negative in the GT, but I followed this for a direct comparison when I was pushing on this project. You can surly try a different sampling strategy as long as you keep the number of annotated instance as K for each novel category.

ZhangGongjie commented 3 years ago

Thanks for your clarification. Thank you so much!