I found a problem in loading the pre-trained file 'vg-faster-rcnn.tar'.
The anchor ratios and anchor scales in neural-motifs are inconsistent with the torchvision.models.detection
motifs
anchor ratios: (0.23232838, 0.63365731, 1.28478321, 3.15089189); scales: (2.22152954, 4.12315647, 7.21692515, 12.60263013, 22.7102731)
torchvision
anchor ratios: (0.5, 1.0, 2.0); scales: (32, 64, 128, 256, 512).
Thus the pre-trained weights 'vg-faster-rcnn.tar' mismatch the torchvision in rpn.head.bbox_pred (120, 512, 1, 1) vs (60, 512, 1, 1).
I don't know if my analysis above is correct and if this will affect the performance of rpn.
Well, it seems that this repo did not load the weights of rpn.head.bbox_pred. I am confused about whether the detector still works well without the pre-trained rpn. They are important parameters at sgdet.
Hi, thank you for sharing these wonderful works!
I found a problem in loading the pre-trained file 'vg-faster-rcnn.tar'. The anchor ratios and anchor scales in neural-motifs are inconsistent with the
torchvision.models.detection
motifs anchor ratios: (0.23232838, 0.63365731, 1.28478321, 3.15089189); scales: (2.22152954, 4.12315647, 7.21692515, 12.60263013, 22.7102731) torchvision anchor ratios: (0.5, 1.0, 2.0); scales: (32, 64, 128, 256, 512). Thus the pre-trained weights 'vg-faster-rcnn.tar' mismatch the torchvision inrpn.head.bbox_pred
(120, 512, 1, 1) vs (60, 512, 1, 1).I don't know if my analysis above is correct and if this will affect the performance of rpn.