facebookresearch / audio2photoreal

Code and dataset for photorealistic Codec Avatars driven from audio
Other
2.7k stars 254 forks source link

How to train a new model from scratch #62

Closed sj123sheng closed 6 months ago

sj123sheng commented 6 months ago

How to train a new model from scratch How to generate the dataset required for training a new model Please provide how the corresponding wav and npy files in the dateset directory are generated

alexanderrichard commented 6 months ago

Hi, to train on a different dataset you first need a dataset. This would be a 3D multi-view capture (multiple cameras that record the person from different angles at the same time). There are some datasets like this available online.

We then do 3D body tracking and 3D face tracking separately. The face tracking is described in this paper: https://arxiv.org/abs/2207.11243

The body tracking is similar but has some additional steps. Based on these 3D recordings, you can extract keypoints and body part segmentations, and run 3D reconstruction, obtaining a 3D point cloud. Now you want to fit that to a body template (e.g., SMPL). This is a slightly more involved process where you'd want to find the right joint angles and apply them to your template mesh (running inverse kinematics) to find the right body pose. The end of this process is a parameterization of your body pose by the parameters of you rig. In our case, it's the joint angles you find in the pose.npy files. Texture unwrapping for the body is the same as for faces.

Long story short, creating a completely new dataset from scratch requires quite some work if you don't have any of the body tracking pipelines in place already.

Hope that helps!