visual localization without changing the database

forEachWhileTrue commented 1 year ago

Hello! First of all, I would like to say that I really appreciate your work, which I am currently using as part of my bachelor thesis. At the moment, I'm a bit stuck with localizing the camera pose for a query image.

I would like to describe my application a bit to clarify the problem: I want to render depth maps from a NeRF for occlusion handling in augmented reality on smartphone devices. The camera poses for the NeRF training are calculated with the help of hloc. So if the user is located in the real scene, which also exists as hloc and NeRF model, I want to localize the user's position with the help of the current smartphone camera frame and hloc, and then render the depth map from the NeRF model for the calculated camera pose. Of course, everything should happen as fast as possible. :)

My problem is as follows: For each query image, I only want to calculate the camera pose without putting the query itself into the database of the hloc model. I tried the approach from the demo notebook.

query = 'query/night.jpg'
references_registered = [model.images[i].name for i in model.reg_image_ids()]

extract_features.main(feature_conf, images, image_list=[query], feature_path=features, overwrite=True)
pairs_from_exhaustive.main(loc_pairs, image_list=[query], ref_list=references_registered)
match_features.main(matcher_conf, loc_pairs, features=features, matches=matches, overwrite=True)

camera = pycolmap.infer_camera_from_image(images / query)
ref_ids = [model.find_image_with_name(n).image_id for n in references_registered]
conf = {
    'estimation': {'ransac': {'max_error': 12}},
    'refinement': {'refine_focal_length': True, 'refine_extra_params': True},
}
localizer = QueryLocalizer(model, conf)
ret, log = pose_from_cluster(localizer, query, camera, ref_ids, features, matches)

It seemed to me that the query image was put in the database, which is unnecessary in my case. Is there maybe a way to avoid this? So in the best case, I'm looking for a function that just extracts the features from the query image, matches them against a trained model, and then calculates and returns the camera pose.

Also, I would be grateful to know how I could localize an image with the help of retrieval and global descriptors because I want to avoid exhaustive matching. Right now I'm using Netvlad for retrieval and disk + lightglue for feature extraction and matching.

Thanks in advance!

sarlinpe commented 1 year ago

It seemed to me that the query image was put in the database, which is unnecessary in my case. Is there maybe a way to avoid this? So in the best case, I'm looking for a function that just extracts the features from the query image, matches them against a trained model, and then calculates and returns the camera pose.

This sounds reasonable, but your code indeed adds the query features and matches to the respective files features and matches. We don't yet provide an interface for disk-less localization - though this is a high priority at this point. In the meantime, you can just copy code from extract_features and match_features to recreate the pipeline without reading and writing to disk.

Also, I would be grateful to know how I could localize an image with the help of retrieval and global descriptors because I want to avoid exhaustive matching. Right now I'm using Netvlad for retrieval and disk + lightglue for feature extraction and matching.

Just use code from pairs_from_retrieval to pipe pairs into match_features.

forEachWhileTrue commented 1 year ago

Thank you very much for the quick and helpful reply! I will just try this out.

cvg / Hierarchical-Localization

visual localization without changing the database #316