How is AP(novel) > AP(base) on COCO?

That's a good question!

I actually have a few guesses but not confirmed thoughts:

the number of ambiguous categories in COCO is larger than that in VOC, when the prediction branch being re-initialized and re-trained in the few-shot fine-tuning stage, the performance of base classes usually suffers a severe drop as the network tries to conduct a multi-class classification and predict class-specific box locations for each RoI. That performance drop is much obvious in COCO because our network conditions the output on the combination of query features and class features, the more ambiguous are the class features, the less precise are the outputs.
the class-agnostic box regression branch proposed in TFA seems to handle this issue very well by always using the same box regressor for all the classes, which I think could benefit most current few-shot object detection networks using a specific prediction bin for each class.

YoungXIAO13 / FewShotDetection