cvg / Hierarchical-Localization

Visual localization made easy with hloc
Apache License 2.0
3.24k stars 600 forks source link

Camera Parameter Refinement during Localization #295

Closed iperper closed 1 year ago

iperper commented 1 year ago

I am trying to better understand the impact of refine_extra_params during pose estimation.

I have set up a pipeline to localize images against a custom dataset. During localization, I create an OPENCV camera model because I have different fx and fy parameters in my camera intrinsics. However, I do not know the distortion parameters. Thus, I create a camera as follows: camera = pycolmap.Camera("OPENCV", 640, 480, [fx, fy, cx, cy, 0, 0, 0, 0])

For localization, I create a query localizer and perform localization.

conf = {
    'estimation': {'ransac': {'max_error': 12}},
    'refinement': {'refine_focal_length': False, 'refine_extra_params': True},
}
localizer = QueryLocalizer(model, conf)

I set the refine_focal_length=False because I'm fairly confident on the focal length, but refine_extra_params=True since I don't have any prior on the distortion coefficients (although I do have prior on the cx, cy).

However, this leads to very poor localization results (can be off by 100s of meters). If I set refine_extra_params=False, the localization results become acceptable.

My questions are:

sarlinpe commented 1 year ago

If an image is constrained by few correspondences and if your camera model is over-parametrized (e.g. OpenCV model for an image without any distortion), then the pose refinement (simple least-squares optimization of reprojection errors) can overfit the model to better explain any noise in the correspondences (keypoint noise or mild outliers). This is less likely to occur for a camera model that is less expressive (e.g. SIMPLE_RADIAL). COLMAP actually refines the extra params only if no image of the same camera was registered before (link). COLMAP however always refines the extra params in the BA (link), so the estimate improves as it triangulates more points or adds more images of the same camera.

  1. Are the images distorted at all? If so, why not using the PINHOLE model?
  2. How different are fx and fy? If they are almost identical (<5%?) then you could use the simpler models RADIAL or SIMPLE_RADIAL.
  3. Do you intend to register multiple images? If so: i. Are all images taken by the same camera? If so, you could estimate a good camera params with the query that has the highest number of matches, and use these params (without refinement) for the other queries. ii. Are the query images covisible? If so, you could just run SfM (hloc.reconstruction) to let COLMAP register them, triangulate points, and refine them jointly with BA.
iperper commented 1 year ago

These are all very help insights, thanks!

My goal in using OPENCV model was to express the different fx and fy (rather than trying to model any specific distortion), but since they are close (<1%), I can go for a simpler camera model.

I will be registering multiple images, and can try both of your recommendations.