eeskimez / emotalkingface

The code for the paper "Speech Driven Talking Face Generation from a Single Image and an Emotion Condition"
MIT License
165 stars 31 forks source link

May I ask how much the speed can be achieved during the test #9

Closed 110wuqu closed 1 year ago

yzyouzhang commented 2 years ago

For video generation, it takes roughly 17s to generate each 30-second video at the current resolution 128x128. However, it will take much less than 17s*N for N videos. In detail, our generation process contains mainly three parts: 1) load the pre-trained model on GPU 2) align the landmark of the input image with a given template image 3) generate the emotional talking faces. In my experiment, the three parts take 5, 9, 3 seconds, respectively. If we generate longer videos, the time cost in the first two parts will not change so that it will still be quick (e.g., 20s for generating a 1-min video).

110wuqu commented 2 years ago

谢谢您的答复,我测试之后发现生成的视频极为模糊,而且画面抖动很厉害

110wuqu commented 2 years ago

https://user-images.githubusercontent.com/49581935/165208123-2206a0a3-6cbd-4d2c-8d2f-7930f80388b3.mp4

实验的其中一个结果

yzyouzhang commented 2 years ago

Thanks for sharing your results. Please refer to our results in the webpage. https://labsites.rochester.edu/air/projects/tfaceemo.html