OpenTalker / StyleHEAT

[ECCV 2022] StyleHEAT: A framework for high-resolution editable talking face generation
MIT License
627 stars 77 forks source link

Image preprocessing for inference on VoxCeleb dataset #18

Closed StelaBou closed 1 year ago

StelaBou commented 2 years ago

Hello,

I am trying to evaluate your model on VoxCeleb dataset, however the results are poor. I have preprocessed the dataset using https://github.com/AliaksandrSiarohin/video-preprocessing and I run your inference script using --if_extract and --if-align arguments.

Is something wrong with the preprocessing of the facial images? Additionally, is your model able to handle the roll angle of the head pose?

Thank you!

FeiiYin commented 2 years ago

Since our model is finally trained on the HDTF dataset in an end-to-end way, the performance is quite related to the distribution of the dataset. The roll angle in HDTF varies little, hence our model may get poor results when there exists a large pose changement in the driven video. The problem may be solved via training on datasets with larger pose distributions.