ShenhanQian / GaussianAvatars

[CVPR 2024 Highlight] The official repo for "GaussianAvatars: Photorealistic Head Avatars with Rigged 3D Gaussians"
https://shenhanqian.github.io/gaussian-avatars
Other
617 stars 93 forks source link

Camera Extrinsics #41

Closed JeremyCJM closed 2 months ago

JeremyCJM commented 7 months ago

Hi Shenhan,

I am wondering why you did not use the original camera extrinsics in the Nersemble dataset. Is it because the original extrinsics are not accurate? Since I saw in the Nersemble githut repo, they have to apply a scale factor on camera position to let everything work.

And, if so, how did you get the new extrinsics? Using COLMAP?

Thanks, Jeremy

ShenhanQian commented 7 months ago

Hi Junming,

I indeed used the original camera extrinsics. I just relocated the cameras to place the avatars at the origin, which eases visualization. It will also make life easier for some methods that need to specify a bounding box.

For GaussianAvatars, we didn't scale the camera positions.

We also explained concrete changes here.

JeremyCJM commented 7 months ago

Hi Shenhan,

Thanks for your prompt reply! Let me double assure that we are referring to the same data: the original extrinsics are "camera_params.json" here, right?

The original poses in "camera_params.json" are W2C matrices: https://github.com/tobias-kirschstein/nersemble/blob/7a78f0c02c7a2768242e3fd4e95ffb4bd89792dc/src/nersemble/nerfstudio/dataparser/nersemble_dataparser.py#L197

JeremyCJM commented 7 months ago

Another question: In your reply here https://github.com/ShenhanQian/GaussianAvatars/issues/9#issuecomment-1961014388, there is a global translation applied to the transformed C2W.

After we get FLAME tracking results, we add a global translation to all cameras and the FLAME mesh so that the mean position of the head in each sequence is at the origin.

  1. Does this mean the head position inferred from raw extrinsics of the Nersemble dataset is not at the origin of the world coordinate system?
  2. How did you get the value of global translation?
  3. Is the 'transform_matrix' in "transforms_val.json" already the C2W after applying global translation?

Looking forward to your reply!

Thanks, Junming

JeremyCJM commented 7 months ago

Yet another question: I visualized your C2W in "transforms_val.json" and the C2W computed with the align_cameras_to_axe() function in https://github.com/ShenhanQian/GaussianAvatars/issues/9#issuecomment-1961014388. It looks like your camera array (inner) has wider degree from the left most to the right most camera.

I am not sure if it is due to the differnce of the gram_schmidt_orthogonalization() implementation. I am using this one.

Could you share the code of gram_schmidt_orthogonalization() you are using?

image

ShenhanQian commented 7 months ago

Hi Shenhan,

Thanks for your prompt reply! Let me double assure that we are referring to the same data: the original extrinsics are "camera_params.json" here, right?

The original poses in "camera_params.json" are W2C matrices: https://github.com/tobias-kirschstein/nersemble/blob/7a78f0c02c7a2768242e3fd4e95ffb4bd89792dc/src/nersemble/nerfstudio/dataparser/nersemble_dataparser.py#L197

I was using a pre-distribution version of NeRSemble. The data is not organized in the same way, but I assume the contents should be the same. For the conventions of transformations, you can refer to this link.

ShenhanQian commented 7 months ago

Another question: In your reply here #9 (comment), there is a global translation applied to the transformed C2W.

After we get FLAME tracking results, we add a global translation to all cameras and the FLAME mesh so that the mean position of the head in each sequence is at the origin.

  1. Does this mean the head position inferred from raw extrinsics of the Nersemble dataset is not at the origin of the world coordinate system?
  2. How did you get the value of global translation?
  3. Is the 'transform_matrix' in "transforms_val.json" already the C2W after applying global translation?

Looking forward to your reply!

Thanks, Junming

  1. Yes, the head position of a subject can shift from the origin altough not much.
  2. I first fit FLAME to the sequence, then compute the mean translation of fitted FLAME model.
  3. Based on "export: OpenGL, camera2world" from here, yes.
ShenhanQian commented 7 months ago

Yet another question: I visualized your C2W in "transforms_val.json" and the C2W computed with the align_cameras_to_axe() function in #9 (comment). It looks like your camera array (inner) has wider degree from the left most to the right most camera.

I am not sure if it is due to the differnce of the gram_schmidt_orthogonalization() implementation. I am using this one.

Could you share the code of gram_schmidt_orthogonalization() you are using?

Your visualization isn't very intuitive to me. Here is my implementation:

def gram_schmidt_orthogonalization(M: torch.tensor):
    """conducting Gram-Schmidt process to transform column vectors into orthogonal bases

    Args:
        M: An matrix (num_rows, num_cols)
    Return:
        M: An matrix with orthonormal column vectors (num_rows, num_cols)
    """
    num_rows, num_cols = M.shape
    for c in range(1, num_cols):
        M[:, [c - 1, c]] = F.normalize(M[:, [c - 1, c]], p=2, dim=0)
        M[:, [c]] -= M[:, :c] @ (M[:, :c].T @ M[:, [c]])

    M[:, -1] = F.normalize(M[:, -1], p=2, dim=0)
    return M
JeremyCJM commented 7 months ago

Thank you so much for the detailed answers!