er-muyue / DeFRCN

MIT License
181 stars 43 forks source link

When using other pretrained ResNet backbone to train DeFRCN, the performance of novel classes few-shot detection drops a lot #38

Open miznchimaki opened 2 years ago

miznchimaki commented 2 years ago

Thanks for your great work! I notice that you used MSRA's pretrained ResNet-101 as your initial weights. When I changed it to MSRA's pretrained ResNet-50 to train DeFRCN on base classes and novel classes, I got normal mAP (I also changed the PCB model to pretrained ResNet-50 of torchvision). The following two pictures are my recovering results using MSRA's ResNet-50 as backbone, first one is PASCAL VOC split 1 base classes evaluation results, second one is PASCAL VOC split 1 shot 1 seed 0 repeat 0 fsod fsrw-like results. 1 2 Then I changed the pretrained backbone to ResNet-50 of torchvision. I also noticed that the input channel order of ResNet-50 is RGB, so I changed this item and corresponding pixel normalized mean and std value in Detectron2's config file. Following the instruction of Detectron2, I also chage the item MODEL.RESNETS.STRIDE_IN_1X1 to False (because I used torch model pretrained weights). After above operations, I used Detectron2's detectron2/tools/convert-torchvision-to-d2.py to convert torchvision's pretrained .pth file to .pkl file, the used the converted .pkl file as pretrained backbone weights. The following two pictures are my results using torchvision's ResNet-50 as backbone, first one is PASCAL VOC split1 base classes evaluation results, second one is PASCAL VOC split1 shot 1 seed 0 repeat0 fsod fsrw-like results. 3 4 It is very curious that the mAP of base traing phase is normal when using both MSRA and torchvision's ResNet-50 pretrained model, but the novel classes few-shot performance using torchvision one is much lower than the MSRA one. Ohter shots/seeds/repeats are same. What caused this phenomenon? Is there anything I ignored?

Otherwise, If I don't use Detectron2's detectron2/tools/convert-torchvision-to-d2.py to convert torchvision's pretrained .pth file to .pkl file (this means the model directly loads weights from torchvision's ResNet-50 .pth file), the mAP of base training dropped a lot, What are the reasons for this problem?

Sincerely waiting for your response!