YoungXIAO13 / FewShotDetection

(ECCV 2020) PyTorch implementation of paper "Few-Shot Object Detection and Viewpoint Estimation for Objects in the Wild"
http://imagine.enpc.fr/~xiaoy/FSDetView/
MIT License
211 stars 33 forks source link

Question about train and test #20

Open lizhiyuanUSTC opened 3 years ago

lizhiyuanUSTC commented 3 years ago

Thanks for your great work!

I have some questions about the training and test setting. When training, no matter which class attention vector combined with the query feature, the groundtruth are same. However, the class score are connected with the attention vector when testing the performance and the score for background is from the first attention vector. The loss function did not build the connection between the predict class and the class of attention vector.

Is this strange?

YoungXIAO13 commented 3 years ago

Hi @lizhiyuanUSTC

Thanks for your question!

For the training procedure as well as the loss function, I simply follow the implementation proposed in Meta R-CNN (see this line).

It's true that the GroundTruth remains the same regardless of the class vector in training, while the output with highest classification score is selected in testing.

My understanding is that the box regression branch should be class-agnostic (as discussed in TFA), which means the regression output depends mainly on the RoI features with weak (or even zero) dependence on class feature. As for the classification branch, it tries to classify which class the object in RoI belongs to. This classification branch could work by only looking at the RoI feature and learning a specific weight for each class in the supervised setting. With the class feature vector, we can have four possible cases:

  1. the class vector and the output class is the object class, loss function requires a high classification score
  2. the class vector is object class while output class is not, loss function requires a low classification score
  3. the class vector is not object class while output class is, loss function requires a high classification score
  4. neither the class vector nor the output class is the object class, loss function requires a low classification score

By using the same class vector during training, we enforce the classification branch to be robust in all the cases without a specific relying on any configuration. This should work as the network could simply ignore the class feature vector and condition the output only on the RoI features when there are many samples, while adding class vector feature could surly help in few-shot cases (as demonstrated in FSRW and Meta R-CNN). In testing, as a class-specific prediction is required, for each combination of RoI and class, we can simply combine the RoI feature with the corresponding class vector feature.

lizhiyuanUSTC commented 3 years ago

Thanks for your reply.

Can I say that the attention vector is a special feature augmentation in few shot object detection? I noticed that you did not freeze parameters in meta test, but training all parameters in TFA will leads a fast overfitting for novel training samples.

You say the output with highest classification score is selected in testing, but I do not see the related code in this repo. Please correct me if I am wrong.

YoungXIAO13 commented 3 years ago

If "feature augmentation" means adding additional information to the RoI features, then "yes".

I didn't try freezing the backbone during the second stage, but I agree that it worth a try.

Sorry for the misleading phrase, I've corrected it in the answer above.

lizhiyuanUSTC commented 3 years ago

Thanks for your reply.

I try to predict the class score and box delta with different attention vector, the results are very close.

I noticed that the released pretrain model on base classes can classify all categories, so I try to do meta test by setting the lr as 0(get the attention vector). Strangely, the mAP on COCO novel classes can achive to 10.x on different seeds, which means that the proposed method can achieve a great results without finetune.

YoungXIAO13 commented 3 years ago

That's an interesting finding!

However, I don't quite understand: "I noticed that the released pretrain model on base classes can classify all categories" The box classification and regression branch only have 60 classes after the base training, they could not work on the 20 novel classes, no?

lizhiyuanUSTC commented 3 years ago

I download the pretrained weights on base classes from sh, and the shape of RCNN_cls_score.weight is [81, 4096].

YoungXIAO13 commented 3 years ago

Ah, I forgot that the num_class is always set to be 81 for the dataset COCO.

lizhiyuanUSTC commented 3 years ago

Can you test on novel classes using the released pretrained model? I can not understand why.

I can not replicate your result on my own experiments, unless using your released model. Is there any trick to achieve the comparable results?