YuelangX / Gaussian-Head-Avatar

[CVPR 2024] Official repository for "Gaussian Head Avatar: Ultra High-fidelity Head Avatar via Dynamic Gaussians"
Other
762 stars 48 forks source link

Some possible bugs and questions #36

Open zhoutianyang2002 opened 3 months ago

zhoutianyang2002 commented 3 months ago

Hi! Thank you for your excellent work! When I reading your code to learn how to implement a 3DGS experiment, I found some possible bugs:

  1. In GaussianHeadModule.py, we want to normalize the quaternion here. However, the default dim of F.normalize is 1, not 2. And the dimension of rotation is torch.Size([B,N,4]). So may be we need to let dim=2 to normalize the quaternion.
delta_rotation = delta_attributes[:, :, 3:7] # torch.Size([B,N,4])
rotation = self.rotation.unsqueeze(0).repeat(B, 1, 1) + delta_rotation * self.attributes_scale
rotation = torch.nn.functional.normalize(rotation) 
# maybe should change it to: rotation = torch.nn.functional.normalize(rotation, dim=2)
  1. In the paper, in formula(7), the gaussian attribute scale is not need to change. However, in GaussianHeadModule.py, we change the gaussian attribute scale according to the data S. Is that a mistake? Which version is correct, the paper or the code?
if 'pose' in data:
     ....
     scales = scales * S 
  1. In MeshHeadModule.py, the output of geo_mlp is already have the activation function tanh. However, when calculate the deform of vertices, we implement the tanh again. Is this a repetition?
self.geo_mlp = MLP(cfg.geo_mlp, last_op=nn.Tanh()) # (-1,1)
def deform(self, data):
    ...
    pred = self.geometry(geo_input) # (1,132,424)
    sdf, deform = pred[:, :1, :], pred[:, 1:4, :]
    query_pts = (query_pts + torch.tanh(deform).permute(0, 2, 1) / self.grid_res) # (1,424,3)+(1,424,3)

Besides, may I ask two questions about the code?

  1. In CameraModule.py, why did you comment this line of code? In other words, why we not need to change the face_vertices_camera?
# 可能是将OpenGL坐标系(x右y上z外)转化为OpenCV坐标系(x右y下z里)?亦或者反过来?两个应该都是右手系?
face_vertices_image[:, :, :, 1] = -face_vertices_image[:, :, :, 1]
# face_vertices_camera[:, :, :, 1:] = -face_vertices_camera[:, :, :, 1:] 原作者的注释。
face_normals[:, :, 1:] = -face_normals[:, :, 1:]
  1. what is the difference between the visible and mask? Why we not need to use the visible and mask in reenactment?

Sorry to bother you. Thank you very much!

YuelangX commented 3 months ago
  1. Thanks. It's a mistake.

  2. It is assumed S=1 in the paper. The code is right.

  3. The second torch.tanh() should be deleted.

  4. Just because face_vertices_camera[:, :, :, 1] not used later.

  5. I only calculate loss for the pixels where visible > 0. Mask is used to supervise the mesh geometry.

zhoutianyang2002 commented 3 months ago
  1. Thanks. It's a mistake.
  2. It is assumed S=1 in the paper. The code is right.
  3. The second torch.tanh() should be deleted.
  4. Just because face_vertices_camera[:, :, :, 1] not used later.
  5. I only calculate loss for the pixels where visible > 0. Mask is used to supervise the mesh geometry.

Thank you very much for your reply! May I ask another question? Since we already calculate the deformation of vertices by pose_deform_mlp using the pose as input, why we need to transform the vertices from canonical space to pose space? In other words, what's the difference between the offset calculated by pose_deform_mlp and the transformation of pose as the code below? Thank you very much!

# in MeshHeadModule.py
if 'pose' in data: 
    R = so3_exponential_map(data['pose'][:, :3]) # (1,3,3)
    T = data['pose'][:, None, 3:] # (1,1,3)
    S = data['scale'][:, :, None] # (1,1,1)
    verts_batch = torch.bmm(verts_batch * S, R.permute(0, 2, 1)) + T
YuelangX commented 3 months ago

pose_deform_mlp predicts the offsets of the non-face points in canonical space.

zhoutianyang2002 commented 3 months ago

pose_deform_mlp predicts the offsets of the non-face points in canonical space.

I understand now. Thank you very much! Best wishes!