Camera Parameter Refinement during Localization

iperper commented 1 year ago

I am trying to better understand the impact of refine_extra_params during pose estimation.

I have set up a pipeline to localize images against a custom dataset. During localization, I create an OPENCV camera model because I have different fx and fy parameters in my camera intrinsics. However, I do not know the distortion parameters. Thus, I create a camera as follows: camera = pycolmap.Camera("OPENCV", 640, 480, [fx, fy, cx, cy, 0, 0, 0, 0])

For localization, I create a query localizer and perform localization.

conf = {
    'estimation': {'ransac': {'max_error': 12}},
    'refinement': {'refine_focal_length': False, 'refine_extra_params': True},
}
localizer = QueryLocalizer(model, conf)

I set the refine_focal_length=False because I'm fairly confident on the focal length, but refine_extra_params=True since I don't have any prior on the distortion coefficients (although I do have prior on the cx, cy).

However, this leads to very poor localization results (can be off by 100s of meters). If I set refine_extra_params=False, the localization results become acceptable.

My questions are:

Is there any guidance or documentation (here or in COLMAP) on impact of refine_extra_params? Under what scenarios would the refinement cause issues, and when should it work better (e.g. diverse viewpoints in the retrieval images, etc.)?
Generally the guidance from COLMAP is to use default parameters (e.g. refine focal length and extra params). Is there level of accuracy for camera intrinsics where it makes sense to fix the parameters (e.g. if within 5% of true values)? For reconstruction, it seems to make sense to allow refinement, since there are many images to optimize over. However, for pose estimation I have less intuition on how the refinement is actually achieved.

sarlinpe commented 1 year ago

If an image is constrained by few correspondences and if your camera model is over-parametrized (e.g. OpenCV model for an image without any distortion), then the pose refinement (simple least-squares optimization of reprojection errors) can overfit the model to better explain any noise in the correspondences (keypoint noise or mild outliers). This is less likely to occur for a camera model that is less expressive (e.g. SIMPLE_RADIAL). COLMAP actually refines the extra params only if no image of the same camera was registered before (link). COLMAP however always refines the extra params in the BA (link), so the estimate improves as it triangulates more points or adds more images of the same camera.

Are the images distorted at all? If so, why not using the PINHOLE model?
How different are fx and fy? If they are almost identical (<5%?) then you could use the simpler models RADIAL or SIMPLE_RADIAL.
Do you intend to register multiple images? If so: i. Are all images taken by the same camera? If so, you could estimate a good camera params with the query that has the highest number of matches, and use these params (without refinement) for the other queries. ii. Are the query images covisible? If so, you could just run SfM (hloc.reconstruction) to let COLMAP register them, triangulate points, and refine them jointly with BA.

iperper commented 1 year ago

These are all very help insights, thanks!

My goal in using OPENCV model was to express the different fx and fy (rather than trying to model any specific distortion), but since they are close (<1%), I can go for a simpler camera model.

I will be registering multiple images, and can try both of your recommendations.

cvg / Hierarchical-Localization

Camera Parameter Refinement during Localization #295