DinoMan / speech-driven-animation

949 stars 289 forks source link

The quality of the generated video #68

Closed bachelorY closed 2 years ago

bachelorY commented 2 years ago

Hello! I have run code to generate the test.mp4, and it looks pretty good, but when I use my own image to generate video, something unexpected happend. It is very blurry, and you can even see another face in this video! I tried different images, but the result is still the same. Are there any requirements for the selection of pictures? Thank you for watching my issues.

DinoMan commented 2 years ago

Hi we have only provided models trained on relatively small datasets shot under studio conditions. In addition the diversity of the subjects in these datasets is also quite small. As a result although the model will work for unseen subjects from within the dataset it alters the appearance when using in the wild data (which has different backgrounds etc). We have chosen not to make available the model trained on LRW, which can generalise to any face well. That having said, the best generalisation comes when using the CREMA-D model (because it has the most subjects and best diversity). Although this is not as good as the LRW model it does sometimes work well on in the wild photos. Maybe use the library with the crema model and see if it works better. I hope this helps.

bachelorY commented 2 years ago

Thank you for replying my issue I did as you said, and the quality of the generated video has improved to a certain extent but it is still not satistactory. Thank you again.