YuDeng / Portrait-4D

Portrait4D: Learning One-Shot 4D Head Avatar Synthesis using Synthetic Data (CVPR 24); Portrait4D-v2: Pseudo Multi-View Data Creates Better 4D Head Synthesizer (ECCV 2024)
MIT License
240 stars 10 forks source link

Question on face alignment and Triplane Reconstructor #6

Open yataoz opened 6 months ago

yataoz commented 6 months ago

Hello, thanks for sharing the great work!

I found this piece of code in training/training_loop_recon_v1.py where you used loss.gen_data_by_G_syn to create synthetic images to train the Triplane Reconstructor. Since the camera params are generated randomly, the rendered images tend to have different face sizes and are not quite aligned (as opposed to the well face-aligned FFHQ images you used in the GenHead training). Is this intentional? Does the Triplane Reconstructor NOT rely on face aligned input? Also curious how this could affect the quality of Triplane reconstruction?

Thanks!

YuDeng commented 6 months ago

Hi, we intend to render synthetic images with a wide range of camera views to enhance the generalizability and robustness of the triplane reconstructor. As a result, the reconstructor can tolerate face images with different scales and positions to some degrees.

Indeed, our strategy is very similar to the camera augmentation in Live3DPortrait. Our observation is that the training process takes a longer time to converge and the reconstructor is less inclined to overfitting.