Open yataoz opened 6 months ago
Hi, we intend to render synthetic images with a wide range of camera views to enhance the generalizability and robustness of the triplane reconstructor. As a result, the reconstructor can tolerate face images with different scales and positions to some degrees.
Indeed, our strategy is very similar to the camera augmentation in Live3DPortrait. Our observation is that the training process takes a longer time to converge and the reconstructor is less inclined to overfitting.
Hello, thanks for sharing the great work!
I found this piece of code in training/training_loop_recon_v1.py where you used loss.gen_data_by_G_syn to create synthetic images to train the Triplane Reconstructor. Since the camera params are generated randomly, the rendered images tend to have different face sizes and are not quite aligned (as opposed to the well face-aligned FFHQ images you used in the GenHead training). Is this intentional? Does the Triplane Reconstructor NOT rely on face aligned input? Also curious how this could affect the quality of Triplane reconstruction?
Thanks!