YoungXIAO13 / FewShotDetection

(ECCV 2020) PyTorch implementation of paper "Few-Shot Object Detection and Viewpoint Estimation for Objects in the Wild"
http://imagine.enpc.fr/~xiaoy/FSDetView/
MIT License
209 stars 33 forks source link

How is AP(novel) > AP(base) on COCO? #30

Open arvind-iyer opened 3 years ago

arvind-iyer commented 3 years ago

Correct me if I am mistaken, but the reported AP values for base and novel might be swapped. With the model only seeing 1/3 novel samples compared to the base sample, we expect the detection accuracy to also be lower accordingly. The reported accuracies for VOC seem to abide by this estimation. What is different in COCO for the results to be skewed the other way?

YoungXIAO13 commented 3 years ago

That's a good question!

I actually have a few guesses but not confirmed thoughts:

  1. the number of ambiguous categories in COCO is larger than that in VOC, when the prediction branch being re-initialized and re-trained in the few-shot fine-tuning stage, the performance of base classes usually suffers a severe drop as the network tries to conduct a multi-class classification and predict class-specific box locations for each RoI. That performance drop is much obvious in COCO because our network conditions the output on the combination of query features and class features, the more ambiguous are the class features, the less precise are the outputs.

  2. the class-agnostic box regression branch proposed in TFA seems to handle this issue very well by always using the same box regressor for all the classes, which I think could benefit most current few-shot object detection networks using a specific prediction bin for each class.