SpectacularAI / 3dgs-deblur

[ECCV2024] Gaussian Splatting on the Move: Blur and Rolling Shutter Compensation for Natural Camera Motion
https://spectacularai.github.io/3dgs-deblur/
Apache License 2.0
133 stars 9 forks source link

Question regarding lens distortion and rolling shutter #11

Closed specarmi closed 1 month ago

specarmi commented 1 month ago

Thanks for the great paper and codebase! I have a question regarding how lens distortion is handled with regard to rolling shutter.

As I understand it, the rolling shutter readout time as a function of the row index as shown in equation 8 of the paper is valid for pixel coordinates prior to undistortion, but it is not valid for pixel coordinates after undistortion. Despite this, it seems the Gaussians are projected into pixel coordinates assuming an ideal camera without distortion and the training images are undistorted as they are loaded.

Is this relationship between undistortion and rolling shutter readout timing handled in the implementation or is it assumed negligible?

specarmi commented 1 month ago

Here's some code to visualize what I'm talking about and a suggestion to fix it (assuming my understanding is correct).

Code:

import cv2
import argparse
from pathlib import Path
import yaml
import numpy as np
import matplotlib.pyplot as plt

def display_lookup_table(title, t_roll_lookup):
    plt.imshow(t_roll_lookup)
    plt.colorbar()
    plt.title(title)
    plt.show()

if __name__ == "__main__":
    parser = argparse.ArgumentParser(description="Visualize impact of undistortion on rolling shutter readout timing.")
    parser.add_argument("--path-transforms-file", type=Path, help="Path to transforms.json.")
    args = parser.parse_args()

    with open(args.path_transforms_file, "r") as file_object:
        meta = yaml.safe_load(file_object)

    # Load intrinsics
    w = meta['w']
    h = meta['h']
    fx = meta['fl_x']
    fy = meta['fl_y']
    cx = meta['cx']
    cy = meta['cy']
    k1 = meta['k1']
    k2 = meta['k2']
    p1 = meta['p1']
    p2 = meta['p2']
    camera_matrix = np.eye(3)
    camera_matrix[0, 0] = fx
    camera_matrix[1, 1] = fy
    camera_matrix[0, 2] = cx
    camera_matrix[1, 2] = cy
    distortion_coefficients = np.array([k1, k2, p1, p2])

    # Compute the ideal camera matrix
    camera_matrix_ideal, _ = cv2.getOptimalNewCameraMatrix(camera_matrix, distortion_coefficients, (w, h), alpha=0)

    # Create and display lookup table for rolling shutter readout _before_ undistortion
    T_ro = meta['rolling_shutter_time']
    t_roll_lookup_distorted = np.zeros((h, w), dtype=float)
    for i_row in range(h):
        t_roll_lookup_distorted[i_row] = (i_row / h - 0.5) * T_ro
    display_lookup_table('distorted', t_roll_lookup_distorted)

    # Create and display lookup table for rolling shutter readout _after_ undistortion
    t_roll_lookup_undistorted = cv2.undistort(t_roll_lookup_distorted, camera_matrix, distortion_coefficients, None, camera_matrix_ideal)
    display_lookup_table('undistorted', t_roll_lookup_undistorted)

    # Display the error between them
    display_lookup_table('error', t_roll_lookup_distorted - t_roll_lookup_undistorted)

Result

This is the error image when the code is run on colmap-sai-cli-vels-blur-scored/s20-bike/transforms.json. This suggests that if lens distortion is not accounted for the rolling shutter readout time error can be as large as ~0.3 ms. t_roll_error

Suggested fix

Instead of computing the rolling shutter readout time as $(\frac{y}{H} - \frac{1}{2})T_{ro}$ (as in equation 8 of the paper), you could instead compute t_roll_lookup_undistorted as in the code above and get the rolling shutter readout time as t_roll_lookup_undistorted[y, x]. Note that in this case $\Delta t_k(y)$ is now $\Delta t_k(x, y)$.

oseiskar commented 1 month ago

Hello. Thank you for the detailed analysis, and sorry for the slow reply.

You are correct that the combination of distortion and rolling shutter effect are not modeled accurately in the paper and your version should be more accurate. However, there are some practical problems, which are the reason behind the model used in the paper:

  1. The version you suggest would not be as simple to implement on top of the (then) existing gsplat code
  2. Accuracy of the calibration and the assumed projective compensation of rolling shutter effect with distortion parameters, as briefly discussed in the "Alternative intrinsics" paragraph in Appendix D of the paper

Let me elaborate 2 a bit. One of the problems in both the model in the paper and your model is that they assume that we have an accurate model (intrinsic calibration) of the camera distortion without rolling shutter effect. But do we? If we use COLMAP, its optimization will use the distortion coefficients and intrinsics to compensate for the rolling shutter effect as well as possible, which would be particularly bad if we did not assume shared intrinsics for all the cameras. Consequently, the COLMAP-derived intrinsics may not accurately model the calibration in the rolling-shutter-free case. Table 5 in the paper also hints to this direction: we get better 3DGS novel-view accuracy in we manually calibrate the camera using Kalibr, in some high-readout cases. In the low-readout cases, COLMAP gives better overall results than manual calibration (presumably since the accuracy of Kalibr is also limited).

My guess would be that to get measurable benefits from your model, you would need to handle calibration differently than with COLMAP (one option being similar manual calibration stage as in Table 5) and this operation would be a bit out of the scope of the paper.

specarmi commented 1 month ago

Hi, I appreciate the response! Thanks for confirming my understanding and explaining your perspective. I had been assuming accurate calibration and I agree that could be difficult to obtain for blurry rolling shutter smartphone cameras. Though in addition to calibrating with a calibration target and Kalibr, as you mentioned, I think you could also record an additional sequence with slow egomotion and use that to calibrate with COLMAP. It becomes harder, and definitely out of scope, if you only have access to one highly dynamic video though. Maybe in that case you could optimize the intrinsics during training.

Anyway since the question was answered I'll close the issue!