Closed JeremyCJM closed 2 months ago
Hi Junming,
I indeed used the original camera extrinsics. I just relocated the cameras to place the avatars at the origin, which eases visualization. It will also make life easier for some methods that need to specify a bounding box.
For GaussianAvatars, we didn't scale the camera positions.
We also explained concrete changes here.
Hi Shenhan,
Thanks for your prompt reply! Let me double assure that we are referring to the same data: the original extrinsics are "camera_params.json" here, right?
The original poses in "camera_params.json" are W2C matrices: https://github.com/tobias-kirschstein/nersemble/blob/7a78f0c02c7a2768242e3fd4e95ffb4bd89792dc/src/nersemble/nerfstudio/dataparser/nersemble_dataparser.py#L197
Another question: In your reply here https://github.com/ShenhanQian/GaussianAvatars/issues/9#issuecomment-1961014388, there is a global translation applied to the transformed C2W.
After we get FLAME tracking results, we add a global translation to all cameras and the FLAME mesh so that the mean position of the head in each sequence is at the origin.
Looking forward to your reply!
Thanks, Junming
Yet another question: I visualized your C2W in "transforms_val.json" and the C2W computed with the align_cameras_to_axe() function in https://github.com/ShenhanQian/GaussianAvatars/issues/9#issuecomment-1961014388. It looks like your camera array (inner) has wider degree from the left most to the right most camera.
I am not sure if it is due to the differnce of the gram_schmidt_orthogonalization() implementation. I am using this one.
Could you share the code of gram_schmidt_orthogonalization() you are using?
Hi Shenhan,
Thanks for your prompt reply! Let me double assure that we are referring to the same data: the original extrinsics are "camera_params.json" here, right?
The original poses in "camera_params.json" are W2C matrices: https://github.com/tobias-kirschstein/nersemble/blob/7a78f0c02c7a2768242e3fd4e95ffb4bd89792dc/src/nersemble/nerfstudio/dataparser/nersemble_dataparser.py#L197
I was using a pre-distribution version of NeRSemble. The data is not organized in the same way, but I assume the contents should be the same. For the conventions of transformations, you can refer to this link.
Another question: In your reply here #9 (comment), there is a global translation applied to the transformed C2W.
After we get FLAME tracking results, we add a global translation to all cameras and the FLAME mesh so that the mean position of the head in each sequence is at the origin.
- Does this mean the head position inferred from raw extrinsics of the Nersemble dataset is not at the origin of the world coordinate system?
- How did you get the value of global translation?
- Is the 'transform_matrix' in "transforms_val.json" already the C2W after applying global translation?
Looking forward to your reply!
Thanks, Junming
Yet another question: I visualized your C2W in "transforms_val.json" and the C2W computed with the align_cameras_to_axe() function in #9 (comment). It looks like your camera array (inner) has wider degree from the left most to the right most camera.
I am not sure if it is due to the differnce of the gram_schmidt_orthogonalization() implementation. I am using this one.
Could you share the code of gram_schmidt_orthogonalization() you are using?
Your visualization isn't very intuitive to me. Here is my implementation:
def gram_schmidt_orthogonalization(M: torch.tensor):
"""conducting Gram-Schmidt process to transform column vectors into orthogonal bases
Args:
M: An matrix (num_rows, num_cols)
Return:
M: An matrix with orthonormal column vectors (num_rows, num_cols)
"""
num_rows, num_cols = M.shape
for c in range(1, num_cols):
M[:, [c - 1, c]] = F.normalize(M[:, [c - 1, c]], p=2, dim=0)
M[:, [c]] -= M[:, :c] @ (M[:, :c].T @ M[:, [c]])
M[:, -1] = F.normalize(M[:, -1], p=2, dim=0)
return M
Thank you so much for the detailed answers!
Hi Shenhan,
I am wondering why you did not use the original camera extrinsics in the Nersemble dataset. Is it because the original extrinsics are not accurate? Since I saw in the Nersemble githut repo, they have to apply a scale factor on camera position to let everything work.
And, if so, how did you get the new extrinsics? Using COLMAP?
Thanks, Jeremy