KAIR-BAIR / dycheck

Official JAX Implementation of Monocular Dynamic View Synthesis: A Reality Check (NeurIPS 2022)
https://hangg7.com/dycheck
Apache License 2.0
184 stars 9 forks source link

Need clarification on camera convention for iPhone dataset #12

Open NagabhushanSN95 opened 3 weeks ago

NagabhushanSN95 commented 3 weeks ago

From the existing documentation, it appears that the camera extrinsics are in OpenCV convention (x, -y, -z) or (right, down, into the scene) and are in world-to-camera (w2c) format. Is this right?

I tried warping a frame from the apple scene to the other viewpoint using the depth given. Can you help me with the details? Should the depth be scaled?

The warped frame is not matching the second frame. I used this code to warp the frame.

I read the data as follows:

              camera_params = read_json(camera_params_path)

              focal_length = camera_params['focal_length']
              principal_point = camera_params['principal_point']
              intrinsic = numpy.eye(3)
              intrinsic[0, 0] = focal_length
              intrinsic[1, 1] = focal_length
              intrinsic[0, 2] = principal_point[0]
              intrinsic[1, 2] = principal_point[1]

              rotation_matrix = numpy.array(camera_params['orientation'])
              translation_vector = numpy.array(camera_params['position'])
              extrinsic = numpy.eye(4)
              extrinsic[:3, :3] = rotation_matrix
              extrinsic[:3, 3] = translation_vector
    warper = Warper()
    frame1 = warper.read_image(frame1_path)[:, :, :3]
    frame2 = warper.read_image(frame2_path)[:, :, :3]
    depth1 = warper.read_depth(depth1_path)[:, :, 0]

    warped_frame2 = warper.forward_warp(frame1, None, depth1, extrinsic1, extrinsic2, intrinsic1, intrinsic2)[0]

The frames look like below. frame1 frame2 frame2_warped

NagabhushanSN95 commented 3 weeks ago

Update: I took 8 consecutive frames from each of the three videos to get 30 frames. Assuming that the object motion between is not too much, I ran colmap on them. The relative rotation matrices I got is matching, but the translation isn't. It is not a simple scale factor either. The translation is very different. An example below

Relative extrinsics between 0_00008 and 1_00008 obtained from colmap.

array([[ 0.8271133 ,  0.41969466, -0.37381811,  4.28686663],
       [-0.52556144,  0.81325596, -0.24979976, -1.12470238],
       [ 0.19917018,  0.40307709,  0.89323015,  4.24574654],
       [ 0.        ,  0.        ,  0.        ,  1.        ]])

Relative extrinsics between the same frames obtained from the dataset.

array([[ 0.82652076,  0.42485099, -0.36927638, -0.17995098],
       [-0.52475049,  0.8189421 , -0.23231606,  0.08859444],
       [ 0.20371627,  0.38579203,  0.8998134 ,  0.07466149],
       [ 0.        ,  0.        ,  0.        ,  1.        ]])