YuelangX / LatentAvatar

A PyTorch implementation of "LatentAvatar: Learning Latent Expression Code for Expressive Neural Head Avatar"
MIT License
97 stars 8 forks source link

Pose and expression disentanglement #11

Open yataoz opened 1 month ago

yataoz commented 1 month ago

Thanks for the great work!

I have a question on disentangling pose and expression. You didn't mention in the paper whether the training video requires frontal face. If the training video can be any poses and not limited to frontal face, how do you decouple pose and expression and make sure the latent space of avatar autoencoder only encodes the expression? If frontal face video is required, then how do you make sure the triplane learns the full perspective of a subject (e.g., profile face and back head)?

YuelangX commented 1 month ago

Under the setting of monocular video input, the training video requires frontal face. However, if multiview videos are used for training, the image input to the autoencoder still requires frontal face, but the supervision used to train the avatar can be from any other view (side or back).

yataoz commented 1 month ago

Thanks for the reply! Could you elaborate a bit more on how you obtained the camera pose and scale params in the training data? Any reference links and code are much appreciated :)

YuelangX commented 1 month ago

I use https://github.com/YuelangX/Multiview-3DMM-Fitting to fit a BFM model to a monocular video to generate params.npz.