alirezazareian / ovr-cnn

A new framework for open-vocabulary object detection, based on maskrcnn-benchmark
MIT License
229 stars 28 forks source link

Why FREEZE_CONV_BODY_AT: 2 in configs/zeroshot_v06.yaml? #9

Closed mmaaz60 closed 3 years ago

mmaaz60 commented 3 years ago

In the paper, it is written that you freeze the first 3 layers (stem and 2 ResNet blocks) during fine-tuning, but the best performing configuration zeroshot_v06.yaml freeze on the first two layers (stem and 1 ResNet block). Why is that so?

Thank you

alirezazareian commented 3 years ago

I have tried with various freezing configurations (i.e. 0, 1, 2, 3) and they all result in similar performance. The only important thing is to freeze the V2L layer and NOT freeze the 4th ResNet block. This is because the 4th block undergoes a distribution change. During pretraining, its input is a feature map, while during finetuning and test, its input is proposal features.