Need clarification on camera convention for iPhone dataset

KAIR-BAIR / dycheck

Official JAX Implementation of Monocular Dynamic View Synthesis: A Reality Check (NeurIPS 2022)

Apache License 2.0

184 stars 9 forks source link

From the existing documentation, it appears that the camera extrinsics are in OpenCV convention (x, -y, -z) or (right, down, into the scene) and are in world-to-camera (w2c) format. Is this right?

I tried warping a frame from the apple scene to the other viewpoint using the depth given. Can you help me with the details? Should the depth be scaled?

The warped frame is not matching the second frame. I used this code to warp the frame.

I read the data as follows:

              camera_params = read_json(camera_params_path)

              focal_length = camera_params['focal_length']
              principal_point = camera_params['principal_point']
              intrinsic = numpy.eye(3)
              intrinsic[0, 0] = focal_length
              intrinsic[1, 1] = focal_length
              intrinsic[0, 2] = principal_point[0]
              intrinsic[1, 2] = principal_point[1]

              rotation_matrix = numpy.array(camera_params['orientation'])
              translation_vector = numpy.array(camera_params['position'])
              extrinsic = numpy.eye(4)
              extrinsic[:3, :3] = rotation_matrix
              extrinsic[:3, 3] = translation_vector

    warper = Warper()
    frame1 = warper.read_image(frame1_path)[:, :, :3]
    frame2 = warper.read_image(frame2_path)[:, :, :3]
    depth1 = warper.read_depth(depth1_path)[:, :, 0]

    warped_frame2 = warper.forward_warp(frame1, None, depth1, extrinsic1, extrinsic2, intrinsic1, intrinsic2)[0]

The frames look like below. frame1 frame2 frame2_warped

array([[ 0.8271133 , 0.41969466, -0.37381811, 4.28686663], [-0.52556144, 0.81325596, -0.24979976, -1.12470238], [ 0.19917018, 0.40307709, 0.89323015, 4.24574654], [ 0. , 0. , 0. , 1. ]])

array([[ 0.82652076, 0.42485099, -0.36927638, -0.17995098], [-0.52475049, 0.8189421 , -0.23231606, 0.08859444], [ 0.20371627, 0.38579203, 0.8998134 , 0.07466149], [ 0. , 0. , 0. , 1. ]])

KAIR-BAIR / dycheck

Need clarification on camera convention for iPhone dataset #12