Open branaway opened 2 months ago
The lips of the woman are open in the first frame, while the man's are closed.
It seems to me that that the expression of the first frame of the driving video must match the picture to animate, otherwise the lips of the resultant video would be badly deformed.
Should the inference detect the initial facial expression of the video to offset the expression delta?
Thank you for your attention and feedback. Ensuring that the first frame of the driving video features a frontal face with a neutral expression will yield better results.
We have provided some guidelines on creating an effective driving video here : )
It seems to me that that the expression of the first frame of the driving video must match the picture to animate, otherwise the lips of the resultant video would be badly deformed. Should the inference detect the initial facial expression of the video to offset the expression delta?
Thank you for your attention and feedback. Ensuring that the first frame of the driving video features a frontal face with a neutral expression will yield better results.
We have provided some guidelines on creating an effective driving video here : )
Thank you so much for pointing out the obvious :). The reason I did not use a full video for driving is that I'd like to change the expression of a picture of random expressions (neutral/happy/sad) to the expression in another one photo. I'm interested in morphing one photo only. Is there a recommendation for that? Thanks lots.
"Thank you for your attention and feedback. Ensuring that the first frame of the driving video features a frontal face with a neutral expression will yield better results."
This is my experience.
Quite a few people on here are talking about aligning the expressions to have better results, but I haven't seen that at all. What matters most is what is explained above. A "neutral face" facing forward. The most common "error" I get is having an output where mainly the mouth is acting derpy with lips tight together in a non natural way.
I think the quality could be improved if there would be an option to use the best frame in a driving video instead of the first frame. I'll put that in a separate feature request for better visibility: https://github.com/KwaiVGI/LivePortrait/issues/312
It seems to me that that the expression of the first frame of the driving video must match the picture to animate, otherwise the lips of the resultant video would be badly deformed.
Should the inference detect the initial facial expression of the video to offset the expression delta?