cmusatyalab / mega-nerf

MIT License
459 stars 54 forks source link

questions about c2w in colmap_to_mega_nerf.py after model_aligner #54

Closed hvkwak closed 1 year ago

hvkwak commented 1 year ago

I'd like to know where I make mistakes when running colmap_to_mega_nerf.py in this repo to train a custom dataset which is geo-referenced by model_aligner of COLMAP. After analyzing several repos and issues in this repo, I noticed that my camera poses may be assumed to be (due to model_aligner) starting with XYZs being (Forward, Left, Up) or (Right, Backward, Up). For me Z being Up is important, but XYs are whatever. (Permutation matrices FLU_2_DRB and RBU_2_DRB transforming these two to DRB(Down, Right, Backward) show positive and negative determinant, respectively.)

Code below will first visualize original camera poses all watching (0, 0, 0) and the following visualization is where FLU_2_DRB is used to transform the rotation matrix and the translation vector in camera extrinsics. Unfortunately I was unable to reproduce the transformed camera poses all watching (0, 0, 0), somehow they are 90 degrees watching away. Changing the columns of the [R|t] matrix into (1, -0, 2, 3) order was not a solution. (This only rotates the camera image planes)

Where am I making mistakes? Any ideas or help are very appreciated. Thanks.

import numpy as np
import open3d as o3d

# codes are from several repositories and stackoverflow. camera visualization functions from NeRF++

# from mega-nerf
RDF_2_DRB = np.array([
    [0.0, 1.0, 0.0], 
    [1.0, 0.0, 0.0], 
    [0.0, 0.0, -1.0]
])    

# RBU to DRB
FLU_2_DRB = np.array([
    [0.0, 0.0, -1.0], 
    [0.0, -1.0, 0.0], 
    [-1.0, 0.0, 0.0]
])

RBU_2_DRB = np.array([
    [0.0, 0.0, -1.0], 
    [1.0, 0.0, 0.0], 
    [0.0, 1.0, 0.0]
])

def get_camera_frustum(img_size, K, c2w, frustum_length=0.3, color=[0., 1., 0.]):
    W, H = img_size
    hfov = np.rad2deg(np.arctan(W / 2. / K[0, 0]) * 2.)
    vfov = np.rad2deg(np.arctan(H / 2. / K[1, 1]) * 2.)
    half_w = frustum_length * np.tan(np.deg2rad(hfov / 2.))
    half_h = frustum_length * np.tan(np.deg2rad(vfov / 2.))

    # build view frustum for camera (I, 0)
    frustum_points = np.array([[0., 0., 0.],                          # frustum origin
                               [-half_w, -half_h, frustum_length],    # top-left image corner
                               [half_w, -half_h, frustum_length],     # top-right image corner
                               [half_w, half_h, frustum_length],      # bottom-right image corner
                               [-half_w, half_h, frustum_length]])    # bottom-left image corner
    frustum_lines = np.array([[0, i] for i in range(1, 5)] + [[i, (i+1)] for i in range(1, 4)] + [[4, 1]])
    frustum_colors = np.tile(np.array(color).reshape((1, 3)), (frustum_lines.shape[0], 1))

    # frustum_colors = np.vstack((np.tile(np.array([[1., 0., 0.]]), (4, 1)),
    #                            np.tile(np.array([[0., 1., 0.]]), (4, 1))))

    # transform view frustum from (I, 0) to (R, t)
    frustum_points = np.dot(np.hstack((frustum_points, np.ones_like(frustum_points[:, 0:1]))), c2w.T)
    frustum_points = frustum_points[:, :3] / frustum_points[:, 3:4]

    return frustum_points, frustum_lines, frustum_colors

def frustums2lineset(frustums):
    N = len(frustums)
    merged_points = np.zeros((N*5, 3))      # 5 vertices per frustum
    merged_lines = np.zeros((N*8, 2))       # 8 lines per frustum
    merged_colors = np.zeros((N*8, 3))      # each line gets a color

    for i, (frustum_points, frustum_lines, frustum_colors) in enumerate(frustums):
        merged_points[i*5:(i+1)*5, :] = frustum_points
        merged_lines[i*8:(i+1)*8, :] = frustum_lines + i*5
        merged_colors[i*8:(i+1)*8, :] = frustum_colors

    lineset = o3d.geometry.LineSet()
    lineset.points = o3d.utility.Vector3dVector(merged_points)
    lineset.lines = o3d.utility.Vector2iVector(merged_lines)
    lineset.colors = o3d.utility.Vector3dVector(merged_colors)
    return lineset

def visualize_cameras_custom(sphere_radius, cameras, camera_size=0.2):
    sphere = o3d.geometry.TriangleMesh.create_sphere(radius=sphere_radius, resolution=10)
    sphere = o3d.geometry.LineSet.create_from_triangle_mesh(sphere)
    sphere.paint_uniform_color((1, 0, 0))

    coord_frame = o3d.geometry.TriangleMesh.create_coordinate_frame(size=0.5, origin=[0., 0., 0.])
    things_to_draw = [sphere, coord_frame]

    K = np.array([
        [500, 0.0, 250, 0.0],
        [0.0, 300, 150, 0.0],
        [0.0, 0.0, 1.0, 0.0],
        [0.0, 0.0, 0.0, 1.0]
    ])
    img_size = (500, 300)
    frustums = []
    color = [0.0, 0.0, 1.0]
    for camera in cameras:
        c2w = camera
        frustums.append(get_camera_frustum(img_size, K, c2w, frustum_length=camera_size, color=color))
    cameras = frustums2lineset(frustums)
    things_to_draw.append(cameras)
    o3d.visualization.draw_geometries(things_to_draw)

def rotateA2B(vecA, vecB):
    vecA = vecA/np.sqrt(np.sum(vecA**2))
    vecB = vecB/np.sqrt(np.sum(vecB**2))
    v = np.cross(vecA, vecB)
    #s = np.sqrt(np.sum(v**2))
    c = np.sum(vecA*vecB)
    v_cross = np.array([[0.0, -v[2], v[1]], [v[2], 0.0, -v[0]], [-v[1], v[0], 0.0]])
    return np.identity(3) + v_cross + np.matmul(v_cross, v_cross)*(1/(1+c))

def sphere(r, theta, phi):
    # returns x, y, z
    assert 0.0 <= theta <= np.pi/2.0
    assert 0.0 <= phi <= np.pi
    return np.array([r*np.sin(theta)*np.sin(phi), r*np.sin(theta)*np.cos(phi), r*np.cos(theta)])

def main():
    thetas = np.linspace(0.1, np.pi/2.0, 5)
    phis = np.linspace(0.1, np.pi, 5)
    std_vec = np.array([0.0, 0.0, 1.0])
    r = 1.0
    bottom = np.array([0.0, 0.0, 0.0, 1.0])
    for transform in [False, True]:
        cameras = []
        for theta in thetas:
            for phi in phis:
                t = sphere(r, theta, phi)
                rotation = rotateA2B(std_vec, -t)
                if transform:
                    t = np.matmul(FLU_2_DRB, t)
                    rotation = np.matmul(FLU_2_DRB, rotation)@np.linalg.inv(FLU_2_DRB)
                    cameras.append(np.vstack([np.hstack([rotation, t.reshape(-1, 1)]), bottom]))
                else:
                    cameras.append(np.vstack([np.hstack([rotation, t.reshape(-1, 1)]), bottom]))
        visualize_cameras_custom(1.5, cameras)

if __name__ == "__main__":
    main()
hvkwak commented 1 year ago

I finally got this. The coordinate system of my dataset is RFU and we need this in DRB. (Typical output from COLMAP may be RDF. Note that it was from RDF to DRB in MegaNeRF.) Key idea is that we use permutation matrices but know when to change the basis vectors when transforming rotation matrices. For example, P*c2w*inv(P) transforms c2w, but basis vectors(or convention) won't change, i.e. (0, 0, 1) is still (0, 0, 1). (For more details: https://www.youtube.com/watch?v=P2LTAUO1TdA&t=662s).

I refer to the script colmap_to_meganerf.py provided by the author of this repo. If the dataset is in RFU,

# 1. change c2w from RFU to RDF, basis vectors also change to RDF
c2w[:3, :3] = np.matmul(RFU_to_RDF_NUMPY, c2w[:3, :3]) 

# 2. change RDF to DRB, but basis vector stays same here because it is multiplied on both sides: still RDF
c2w[:3, :3] = np.matmul(RDF_to_DRB_NUMPY, c2w[:3, :3])@np.linalg.inv(RDF_to_DRB_NUMPY)

# 3. Camera position t is easy. Just transfrom from RFU to DRB.
c2w[:3, 3] = np.matmul(RFU_to_DRB_NUMPY, c2w[:3, 3])

so that we have camera extrinsics conditions same as author's example of transforming from RDF to DRB where basis vectors are based on RDF.

chufall commented 1 year ago

@hvkwak From the following code line 34 - 51 https://github.com/Fyusion/LLFF/blob/c6e27b1ee59cb18f054ccb0f87a90214dbe70482/llff/poses/pose_utils.py#L51 I found the procedure to consturct c2w is the same betweem llff and mega-nerf, but the llff didn't make the Pc2winv(P) . I'm confused ! So why the transforming from RDF to DRB should keep the basis vector to be RDF and do the Pc2winv(P) ?

Thank you very much!

hvkwak commented 1 year ago

@chufall It's something weeks ago so I don't remember all the details completely, I tested several scenarios including llff, found that both preprocessings(llff and meganerf) are fine. The main difference in llff is that it just changes the rotation matrix(without using P@C2W@inv(P) ) in line 51. Ray operations that follow after this line (https://github.com/bmild/nerf/blob/18b8aebda6700ed659cb27a0c348b737a5f6ab60/load_llff.py#L246) were also correctly organized. (Tested with the toy camera frustum example I wroted above.) I remember that those preprocessings are to match the coordinate systems with the ray functions(meganerf keeps it in DRB with RDF basis, I guess.).

chufall commented 1 year ago

@hvkwak Thanks a lot , I'll try to test my data.