Skyy93 / Sample4Geo

70 stars 10 forks source link

Geo-Localization for Drone Images Using Cross-View Image Embeddings #23

Closed blackpearl1022 closed 1 month ago

blackpearl1022 commented 1 month ago

Thanks for your amazing work !

Problem When the model extracts embeddings, the positional information of the image is not retained. This makes it impossible to pinpoint the exact position of the drone's target view based on the similarity of embeddings alone. The inability to retain positional data within the embeddings hampers the accuracy and reliability of the geo-localization process.

Steps to Reproduce

Expected Behavior The model should retain the positional information within the embeddings, allowing for accurate determination of the drone's target view position based on the similarity of embeddings.

Actual Behavior The positional information is lost during the embedding extraction process, making it impossible to accurately pinpoint the drone's target view position.

Possible Solution Explore modifications to the embedding extraction process to retain positional information. Investigate alternative architectures or additional components that can preserve positional data within the embeddings. Consider combining the current embedding approach with a complementary method that retains positional information.

Additional Context The model is implemented for geo-localization purposes in drone imagery. The primary objective is to accurately determine the position of the drone's target view using image embeddings.

So any other optimized algorithm or approach to get the correct drone target view position using this model ? Thanks !

Skyy93 commented 1 month ago

I think you misunderstood the task of cross-view geo-localisation. The ground-view/drone-view embeddings are used to find a close neighbor in a set of satellite images that are also embedded. The positions of the satellite images is known beforehand. Therefore this is not a direct regression of the positional information present in this work.

blackpearl1022 commented 1 month ago

I think you misunderstood the task of cross-view geo-localisation. The ground-view/drone-view embeddings are used to find a close neighbor in a set of satellite images that are also embedded. The positions of the satellite images is known beforehand. Therefore this is not a direct regression of the positional information present in this work.

Yeah, I know that. But in most common way, we are using the set of satellite images and we are finding the best matched satellites image for the given ground-view/drone-view images. But as you know, since we can not split the satellite images for the given area with 1 pixels when we make the set of satellite images from the given satellite map, so we can not pick the correct position of the drone image, actually we are just picking up the estimated position, isn't that ?

Skyy93 commented 1 month ago

Ah I see, when you would need that I would recommend to train a simple regression Module (one linear layer on top of the embeddings) to predict the offsets like the authors of the TransGeo paper did (https://arxiv.org/abs/2204.00097).

In our work we did not focus on this kind of offset prediction. Its only the estimated position, or a "is this streetview/drone-view position within this satellite/overhead image".

blackpearl1022 commented 1 month ago

Great, thanks !