ShenhanQian / GaussianAvatars

[CVPR 2024 Highlight] The official repo for "GaussianAvatars: Photorealistic Head Avatars with Rigged 3D Gaussians"
https://shenhanqian.github.io/gaussian-avatars
Other
613 stars 93 forks source link

How to convert the extrinsic to get eye view images? #69

Open Sycamoretree opened 4 days ago

Sycamoretree commented 4 days ago

Hello, first of all, thanks a lot for the wonderful work and making it open source. I wanted to run this on a custom dataset and get a eye view videos,but I get wrong results. First,I use VHAP to preprocessing a monocular video and export as export_as_nerf_dataset. Then, I use processed data to train GaussianAvatars. Finally, I can get the reconstructed video same as processed video by VHAP.

I want to put the camera under the eye, but what I did didn't work. 1.Get eye landmark from "landmark2d/STAR.npz" 2.Convert from pixel coordinates to world coordinates, my code is below:

def pixel_to_camera_to_world_c2w(K, c2w, u, v, Z):
    """
    input:
    K: intrinsic
    c2w: extrinsic
    u,v: landmark[u, v , z], default z = 1

    output:
    converted c2w
    """
    # get the world-to-camera transform and set R, T
    w2c = np.linalg.inv(c2w)
    R = w2c[:3,:3]  
    T = w2c[:3, 3]

    fx = K[0, 0]
    fy = K[1, 1]
    cx = K[0, 2]
    cy = K[1, 2]

    X = (u-cx) * Z / fx
    Y = (v-cy) * Z / fy
    pixel_camera = np.array([X, Y, Z])
    P_world = R.T @ (pixel_camera - T)

    w2c[:3, 3] = P_world
    c2w = np.linalg.inv(w2c)
    return c2w
T_ori = [-0.0067688   0.00801899  0.73578021] 
T_new = [0.0012962  0.02223533 0.26421979]

Using the changed c2w instead of the original c2w, rendered image does not come from the eye's perspective .original c2w locates in function readCamerasFromTransforms of scene/dataset_readers.py.

I want the camera to be relatively still relative to the eye.Could you give me some advice on how to adjust the camera parameters?

ShenhanQian commented 3 days ago

If I'm getting your goal right, you want to move the camera closer to an eye, essentially, moving the camera in the direction of a vector $v=t_e-t_c$, where $t_c$ is the original camera location and $t_e$ is the location of an eye.

$t_c$ is available from the preprocessed data. $t_e$ can be computed with the following function to get vertices by regions: https://github.com/ShenhanQian/GaussianAvatars/blob/b799256675aea75ffd0d3aa8c69f774853e3d618/flame_model/flame.py#L869C9-L880

Here you can find the definitions of of vertex mask for eyes: https://github.com/ShenhanQian/GaussianAvatars/blob/b799256675aea75ffd0d3aa8c69f774853e3d618/flame_model/flame.py#L797-L798

Sycamoretree commented 2 days ago

Thanks a lot, your method is great, trackiing with FLAME vertex. Sorry, I didn't express my meaning clearly. What I want is to put the camera at the corner of the eye and follow the person's head as it moves. In other words, the camera is attached to the side of the eye. Because I want to obtain the eye view of a monocular video, before you reply, I thought of adjusting the camera position by 2D landmarks, but due to the lack of depth information, my method is useless, and I can't even achieve head rotation tracking. I try to use the first vertex point of left_eye_region and left_eyeball instead of T in the camera extrinsic matrix, but I got white picture. Then I make T[2]=0.3, picture become clear, but the camera doesn't follow the head movement. And I use rotate of FLAME to get a new Rotation Matrix instead of R in the camera extrinsic matrix, but result is wrong.

the method of getting vetex position:

# vid_left_eye[0] = 18
# vid_left_eye[287] = 3931
for index_v in [18, 3931]:
    key1 = str(index_v) + "_verts"
    self.eye_3d[key1] = verts[0][index_v].cpu().numpy().tolist()
    key2 = str(index_v) + "verts_cano"
    self.eye_3d[key2] = verts_cano[0][index_v].cpu().numpy().tolist()

the method of getting Rotation Matrix:

def euler_to_rotation_matrix(euler_angles):
    roll, pitch, yaw = euler_angles[0]
    R_x = np.array([[1, 0, 0],
                    [0, np.cos(roll), -np.sin(roll)],
                    [0, np.sin(roll), np.cos(roll)]])

    R_y = np.array([[np.cos(pitch), 0, np.sin(pitch)],
                    [0, 1, 0],
                    [-np.sin(pitch), 0, np.cos(pitch)]])

    R_z = np.array([[np.cos(yaw), -np.sin(yaw), 0],
                    [np.sin(yaw), np.cos(yaw), 0],
                    [0, 0, 1]])

    R = R_z @ R_y @ R_x
    R[1:3,:3] *= -1
    return R

camera extrinsic matrix


# original camera extrinsic matrix, every timestep is same,  output images are photorealistic:
w2c = [     [ 1.          0.          0.         -0.0067688 ],
            [-0.         -1.         -0.          0.00801899],
            [-0.         -0.         -1.          0.73578021],
            [ 0.          0.          0.          1.        ]

# New camera extrinsic matrix, every timestep is chaning with corresponding vertex and rotation, but outputs whites images.
w2c = [      [ 0.99839343 -0.05142186  0.02379833  0.04601083],
             [-0.05322711 -0.99514819  0.08274636  0.0095929 ],
             [ 0.01942789 -0.08388014 -0.99628644  0.00293474],
             [ 0.          0.          0.          1.        ]]