DinoMan / speech-driven-animation

949 stars 289 forks source link

Will you open source the Irw model as well? #60

Closed himanirajora closed 3 years ago

himanirajora commented 3 years ago

Current models(grid, timit, crema) seem to be inaccurate even for the examples provided in ReadMe. Will you make the Irw model available?

DinoMan commented 3 years ago

We will not be making the LRW model available. The grid, timit and crema pretrained models should be accurate for their respective datasets so for example the grid model should work well for the example. The timit, crema models will not work well on the example because they have been trained under very different circumstances (i.e. backgrounds, lighting) so they tend to not generalize to other datasets well.

himanirajora commented 3 years ago

I see, thanks for your response. Quicktime player played it right, I was using VLC earlier which was playing them abruptly. My use case is to animate lips of a cartoon image based on audios. Do you have any leads on what parameters (image/audio/model) would work well? Current settings seem to adversely impact the resolution of the video generated.

DinoMan commented 3 years ago

These pre-trained models would likely not work well on cartoons. They have seen only a handful of faces and will not generalize well I think in these cases. The crema-d model might be a bit better since it has seen the most faces but again it will likely not be great.