Closed Cohesion97 closed 2 years ago
Hi, we've explored both options. The default option (used in the paper) is using an average over a SwAV ResNet50 intermediate feature map. Another option is using SwAV projector feature vector output. Holding everything else fixed, both options perform similarly well.
https://github.com/amirbar/DETReg/blob/0a258d879d8981b27ab032b83defc6dfcbf07d35/models/backbone.py#L156-L177
It seems 'head' is the new training setting that uses dim=128 to align features. But dim=512 ('intermediate') is used in the paper. Does it mean that we should change to dim=128 ('head') to achieve better performance of DETReg?
Thanks.