Open loboere opened 2 years ago
I am not sure, by "distorted", if you mean the resolution issue. We trained our model with a low-resolution 128x128 due to the limitation of the dataset and our computation resources. You can try to modify the resolution and retrain the model, or you could try some super-resolution methods. To improve the identity, maybe you can try to reweigh the losses. Thanks.
Yaa, I also tried this repo and noticed the identity loss. The character in the resultant video doesn't look like the reference image we provide as input.
What do you mean by reweigh the losses @yzyouzhang ? Thanks.
Hi, I mean increase the weight for the identity loss in the total loss. Could you please also describe the issue a little bit more or give some samples? How different is the generated image from the reference? Thanks.
yaa Sure,
Here is the results.zip
file which contains the condition.png
(reference image) and the output videos in different expressions.
Are you using our pre-trained model? If so, it is trained on the CREMA-D dataset, which has limited samples/data distribution, and it might not generalize well to images outside of CREMA-D. Therefore, if you want it to generalize well, you might need a more extensive dataset, which is hard to find since we require emotion labels. You can try omitting emotion labels and use the LRW dataset for better generalization.
I checked your output videos and find that the speech are much longer than those in our training data (~2s). The quality of the first two seconds are reasonable. This might suggest the generalization ability needs to be improved as Emre said.
For the dataset, there is a new emotional talking face dataset released after our publication, called MEAD. Feel free to try to train our model on that dataset. I am also interested in the results.
Thanks @eeskimez and @yzyouzhang for clarification.
When I use an image of mine it is distorted a lot and does not look like the original face, is there any parameter to improve the identity?