Added control over image resizing and backbone freezing

gmberton / CosPlace

Official code for CVPR 2022 paper "Rethinking Visual Geo-localization for Large-Scale Applications"

MIT License

299 stars 58 forks source link

Added control over image resizing and backbone freezing #37

Closed lasuomela closed 1 year ago

lasuomela commented 1 year ago

These changes enable training the network with image sizes other than 512x512 (arg: --image_size).

Optionally, the test images can also be resized (arg: --resize_test_imgs).

When training with image sizes that drastically differ from the image sizes used for pretraining the backbones, all backbone layers should be set to trainable using the --train_all_layers arg.

gmberton commented 1 year ago

Hi, great PR! Just a minor thing: I'd prefer not to pass args to TestDataset, could you pass only the two parameters image_size and resize_test_imgs, as last parameters, and with a default value of 512 and False?

When training with image sizes that drastically differ from the image sizes used for pretraining the backbones, all backbone layers should be set to trainable using the --train_all_layers arg.

Do you have any evidence on this? In my experience the lower layers do not need to be touched if they are properly pretrained, but I never trained on such small resolutions as in your paper.

lasuomela commented 1 year ago

Hi, thanks!

I did the requested changes to TestDataset.

Regarding frozen backbones:

train.py --image_size 85 --resize_test_imgs --train_all_layers
< test - #q: 1000; #db: 2805840 >: R@1: 21.3, R@5: 32.4, R@10: 38.9, R@20: 45.2

train.py --image_size 85 --resize_test_imgs
< test - #q: 1000; #db: 2805840 >: R@1: 14.8, R@5: 24.7, R@10: 29.4, R@20: 34.6

So I think training all layers is justified. I don't know if the same would apply if training with very large-resolution images.

Let me know if there's need for additional changes!

gmberton commented 1 year ago

PR is merged, thank you for the contribution! Your results clearly show that training all layers is useful when using such low resolution, that is indeed very interesting.