VicenteVivan / geo-clip

This is an official PyTorch implementation of our NeurIPS 2023 paper "GeoCLIP: Clip-Inspired Alignment between Locations and Images for Effective Worldwide Geo-localization"
https://arxiv.org/abs/2309.16020
MIT License
125 stars 21 forks source link

Higher Resolution for GPS Coordinates #7

Closed Skyy93 closed 5 months ago

Skyy93 commented 5 months ago

Thanks for the great work!

Did I understand correctly that the sigma parameter controls the resolution of the frequencies and if you need a higher resolution for the GPS coordinates you have to increase it? You use [20, 24, 2**8] for a resolution of up to one km, what about metre-level resolution?

Thank you very much

VicenteVivan commented 5 months ago

Hi @Skyy93,

Thank you for your interest in our work and your kind words. Yes, you are correct. Based on our experience, to increase the resolution of the encoder of the GPS coordinates, you could start by adding an extra RFF encoding layer followed by an MLP to the location encoder with a higher $\sigma$ (e.g. $\sigma = 2^{12}$). In code, this would look like this:

from geoclip import LocationEncoder

gps_encoder = LocationEncoder(sigma=[2**0, 2**4, 2**8, 2**12], from_pretrained=False)

Additionally, I would recommend you to manually load the pre-trained weights for the encoders with $\sigma_i \in \{2^0, 2^4, 2^8 \}$ so that you can have a warm start if you decide to fine-tune GeoCLIP. Alternatively, you could load and freeze these weights and only fine-tune the new branch of the location encoder. You can add additional branches with increasing $\sigma$ values if you need higher resolution for the purposes of your experiment. As an additional tip, you could enforce meter-level smoothness in the learned representation by augmenting the GPS coordinates of your dataset, adding Gaussian noise with an std of a few meters, similarly to how we did in our paper.

Please, let us know if you have any more questions.