covisibility clustering for InLoc

Shubodh commented 1 year ago

I mentioned this in another issue here but felt it's worth opening a separate issue since "covisibility clustering for InLoc" is considered a valuable contribution as mentioned in README. Referencing it from original issue:

@Skydes If I have a dataset like InLoc with poses and 3D points for every reference image, is there an example which builds the COLMAP database (which involves creation of unique IDs for Points3D, Camera, Image etc. Getting indexing right seems to be a tedious process looking at read_nvm_model() function)? I am aware of pipeline_InLoc.ipynb but I am not looking to localize query image. Instead: I am basically trying to implement "covisibility clustering for InLoc". From what I understand, I can do it in following two ways:

if I have my data in COLMAP database format, it is very easy to implement covisibility clustering by inputting that COLMAP reconstruction object to the function do_covisibility_clustering(). Here I don't want to do any reconstruction though, since I already have 3D point sets and their corresponding poses.
Otherwise, I can avoid COLMAP database but will have to heavily rewrite do_covisibility_clustering() which involves defining unique IDs for Points3D, Camera, Image etc. If I have to do that anyway, I might as well create COLMAP database and just use do what I mentioned in the first point?

Let me know if I am on right track and please give further directions.

Originally posted by @Shubodh in https://github.com/cvg/Hierarchical-Localization/issues/7#issuecomment-1402724284

sarlinpe commented 1 year ago

Thanks for offering to help on this! IIRC there are a few issues with InLoc:

the camera calibration of the reference images is not known with high accuracy (and might not even follow a pinhole model).
The images are generally too sparsely distributed to reliably triangulate a 3D model with feature matching - so we indeed need to use the lidar scans.

The covisibility clustering requires visibility information - i.e. by which reference image is a given 3D point observed. You could try to project each 3D point to other scans and filter out occlusion by checking whether the depth is consistent - but this will only work if the camera calibration is sufficient. You could use the calibration provided by the kapture format: https://github.com/naver/kapture/blob/main/doc/datasets.adoc#6-inloc

Shubodh commented 1 year ago

Thank you for the quick reply, Paul.

I certainly want to figure out these issues and contribute for InLoc eventually, however I am currently working on another indoor dataset called RIO10.

1. `hloc.pipelines.RIO10` module

This RIO10 dataset has RGB-D + pose (hence slightly different from InLoc), but I have modified InLoc pipeline and got it to work for RIO10 successfully and matched their reported results for standard localization methods (like SP+SG+NetVLAD etc) using our new hloc.pipelines.RIO10 module :partying_face: :partying_face:! We actually discussed over email sometime back and I also shared those results over email. I will submit a PR for this as a new hloc.pipelines.RIO10 module in a month or so (my hloc is v1.2 so have to make changes).

2. Now coming back to our covisibility clustering (on RIO10):

RIO10 has densely distributed images and they give camera calibration params for every room, so let us assume it doesn't have issues like InLoc for now.

Now for this RIO10 dataset, how do I go about implementing covisibility clustering with hloc/COLMAP? Getting indexing right seems to be a tedious process looking at read_nvm_model() function, moreover hloc nowhere seems to implement covisibility clustering from scratch, i.e. defining unique IDs, mapping indices etc. Could you point me to an example to some other repo which implements covisibility clustering from scratch so that I can understand the low level details and then I will have more clarity to integrate it with hloc/COLMAP as well.

3. Bonus PR

As a bonus, I feel [indoor localization pipeline with covisibility] for a generic RGB-D dataset would be very useful addition to your amazing toolbox!! In fact I can even submit a PR for indoor equivalent of demo.ipynb for small indoor data equivalent of Sacre Coeur, eventually.

Shubodh commented 1 year ago

Could you please get back on this? @Skydes

sarlinpe commented 1 year ago

hloc.pipelines.RIO10 module

Great, feel free to send a PR, it'd be great to have such pipeline.

Now coming back to our covisibility clustering (on RIO10):

You again have two options:

Triangulate a SfM model using RGB-only images: This should work since the images are more densely distributed. You can just convert RIO poses+intrinsics to an empty COLMAP model - this is easy with pycolmap:

rec = pycolmap.Reconstruction()
camera_id = 1  # increment for each new camera
cam1 = pycolmap.Camera(model_name, w, h, params, camera_id)
rec.add_camera(cam1)
image_id = 1  # increment for each new image
im1 = pycolmap.Image(image_name, [], tvec, qvec, camera_id, id=image_id)
rec.add_image(im1)
rec.write("path/")

And then run the standard triangulation pipeline.

Compute the visual overlap instead of the sparse covisibility: This is a good proxy and fairly easy with depth maps. This is what we use in LaMAR (implementation here): You can replace the mesh ray tracing with back/re-projection and consistency checks of depth maps. Unlike option 1., this should be applicable to InLoc as well (if the camera calibration can be fixed).

cvg / Hierarchical-Localization