MStypulkowski / diffused-heads

Official repository for Diffused Heads: Diffusion Models Beat GANs on Talking-Face Generation
Other
471 stars 33 forks source link

how long is the generated video? #4

Closed baiyuting closed 1 year ago

baiyuting commented 1 year ago

the paper say the model could not generate long video, so, how long is the video generated by current model? Is it a common problem for all talking head generation methods given only an image and an audio?

MStypulkowski commented 1 year ago

We managed to generate videos up to 9s long.

The problem is caused by a lack of driving video (or any other source of movement) and an iterative generation process where errors are propagated from previously synthesized frames. For a model trained on sequences, the problem is less likely to appear but usually it requires much bigger computing power to handle additional time axis. Also, achieving temporal consistency is then more challenging.