YudongGuo / AD-NeRF

This repository contains a PyTorch implementation of "AD-NeRF: Audio Driven Neural Radiance Fields for Talking Head Synthesis".
MIT License
1.04k stars 176 forks source link

Order and Scale of the Transformation Matrix #93

Open JeremyCJM opened 2 years ago

JeremyCJM commented 2 years ago

Hi Yudong,

Thanks for the amazing work! I noticed that in the process_data.py file, you have the following manipulation of rotation and transformation matrix:

    trans = params_dict['trans'] / 10.0
    valid_num = euler_angle.shape[0]
    train_val_split = int(valid_num*10/11)
    train_ids = torch.arange(0, train_val_split)
    val_ids = torch.arange(train_val_split, valid_num)
    rot = euler2rot(euler_angle)
    rot_inv = rot.permute(0, 2, 1)
    trans_inv = -torch.bmm(rot_inv, trans.unsqueeze(2))
    pose = torch.eye(4, dtype=torch.float32)

I am wondering why you

  1. downscale the translation vector by 10 trans = params_dict['trans'] / 10.0,
  2. apply permutation on rotation matrix rot_inv = rot.permute(0, 2, 1),
  3. rotate the translation vector trans_inv = -torch.bmm(rot_inv, trans.unsqueeze(2)),
  4. flip the sign of translation vector trans_inv = -torch.bmm(rot_inv, trans.unsqueeze(2))

Looking forward to hearing from you!

Thanks, Jeremy

YudongGuo commented 2 years ago

Hi,

  1. In the face tracking process, the camera space is measured in decimetre, and we convert it to meter by downscaling.
  2. The transformation matrix (rotation and translation) generated in tracking process is a 'canonical space to camera space' transformation. In NeRF, we need the 'camera space to canonical space' transformation, so we do inverse transformation (just 2-4).
JeremyCJM commented 2 years ago

Thanks a lot for your reply! For the first question, why do you want to convert the unit to meter? What is the unit in NeRF?