Video Requirements for Proper Lip Movements

thatnerdyaigirl commented 4 months ago

I am currently experiencing issues with achieving proper lip synchronization between the source and driving videos. There seems to be a mixture of lip movements from both videos, resulting in an unnatural appearance. I would like to request clarification and guidelines on the following points to ensure optimal results.

Questions:

Source Video Composition:

Should the source video be fully zoomed in or out? Is there a specific aspect ratio that should be maintained for the source video?

Lip Movements:

Should the mouth in the source video be closed initially to better match the lip movements of the driving video? To what extent should the mouth be open in the source video if it needs to be open at all?

Retargeting Code Release:

When will the retargeting code for videos be released? Currently, it seems only the Gradio interface supports retargeting for images. Source Video Requirements:

Are there any specific requirements for the source video? Should the source video be silent? Should the lighting, background, or any other factors be considered?

Current Issue:

I am encountering a mixture of lip movements from both the source and driving videos, leading to a weird and unnatural appearance. Any guidance on how to avoid this would be greatly appreciated.

Additional Context:

I am using the latest version of the code available from the repository. Any insights or updates on the retargeting code and the best practices for preparing source videos would be helpful.

iflamed commented 4 months ago

The same problem was encountered with the character's lips not opening as they did in the video, and as a result, the lips of the character in the video were basically in a closed position.

https://github.com/user-attachments/assets/253ab98d-5e85-45f4-9806-e076295722ff

rodsott commented 4 months ago

Yes, I was facing exactly this problem. And fixed it doing some things:

By creating the first frame of my driver video with a neutral, closed mouth, with no teeth showing. Even with my source videos have smiling people, if your first frame of the driver is closed, the lipsync with be much better for different mouth poses.
Another thing is to keep in the driver video the head still, if you don`t want your source video with a moving head, just facial features copied.

Yes, would be great to have retargeting features for video too, just like we have on img2vid, to fine tune the amount of tracking from head, lips and eyes, to be filtered and passed to the source video. =)

Please let me know if it help you somehow.

RoD

iflamed commented 4 months ago

Yes, I was facing exactly this problem. And fixed it doing some things:

By creating the first frame of my driver video with a neutral, closed mouth, with no teeth showing. Even with my source videos have smiling people, if your first frame of the driver is closed, the lipsync with be much better for different mouth poses.

Another thing is to keep in the driver video the head still, if you don`t want your source video with a moving head, just facial features copied.

Yes, would be great to have retargeting features for video too, just like we have on img2vid, to fine tune the amount of tracking from head, lips and eyes, to be filtered and passed to the source video. =)

Please let me know if it help you somehow.

RoD

Thank you, your approach has helped me solve my problem as well.

The first frame, the mouth must be closed.

By creating the first frame of my driver video with a neutral, closed mouth, with no teeth showing. Even with my source videos have smiling people, if your first frame of the driver is closed, the lipsync with be much better for different mouth poses.

rodsott commented 4 months ago

Awesome @iflamed , I`m glad my researches helped you! Sharing is caring! =)

x4080 commented 4 months ago

@rodsott "Another thing is to keep in the driver video the head still, if you don`t want your source video with a moving head, just facial features copied." how to do only facial feactures copied ?

rodsott commented 4 months ago

@rodsott "Another thing is to keep in the driver video the head still, if you don`t want your source video with a moving head, just facial features copied." how to do only facial feactures copied ?

Just record your driver video without moving your head, just like I said. Or if you already have a video with a moving head, stabilize it with a stabilizer software, ie.: Mocha, After Effects, etc.. Then you will have a video only with eyes, lips and jaw moving to use just this features, without the head movement.

I don't know how difficult it is, but probably not much... for the Devs add a feature to retarget head, lips and eyes separately for vid2vid, so this way we could finetune and control better the amount of movement of each part is gets from the driver video. I hope they do! ^_^

x4080 commented 4 months ago

@rodsott Thanks, I thought we can run inference with some parameter to just get the face expression

rodsott commented 4 months ago

@rodsott Thanks, I thought we can run inference with some parameter to just get the face expression

Oh, you are one of the Devs here? I thought it was cleardusk, Jianzhu Guo only.. =)

Yes, I`ve been talking to Kijai during the weekend, testing exactly this, how to filter some head scaling using his code inside ComfyUI.. And he made a lot of adjustments with his code, having this possibility, to filter the retargeting.. You can check his code here: https://github.com/kijai/ComfyUI-LivePortraitKJ/

Please let me know if I can help testing your branch too. I`d be glad to contribute with it!

x4080 commented 4 months ago

@rodsott No, I'm not the devs of liveportrait :) I just an enthusiast

rodsott commented 4 months ago

lol, ok.. I thought you said that running inference with some parameter was about you adding those parameters to the code.. hehehe.. .No problem, they should probably be working on the vid2vid any time soon. It's indeed a powerful tool, the best so far in my opinion for facial animation, and having features to finetune it to its fullest will be a natural destiny for the tool. ^_^

P2Oileen commented 2 months ago

The first frame, the mouth must be closed.

@iflamed Could you please explain why this is a must to generate proper face expressions? Are there any methods to overcome this limitation? I know maybe simply concat a frame with neutral expressions to the very beginning of the video maybe can solve this, but sometimes it's hard to get such a frame. I also tried to record a video with a neutral face, and generated the pkl file of the first frame, then concat to the front of other pkl file, but the generated result is really weird. Thank you for any possible help!

iflamed commented 2 months ago

The first frame, the mouth must be closed.

@iflamed Could you please explain why this is a must to generate proper face expressions? Are there any methods to overcome this limitation? I know maybe simply concat a frame with neutral expressions to the very beginning of the video maybe can solve this, but sometimes it's hard to get such a frame. I also tried to record a video with a neutral face, and generated the pkl file of the first frame, then concat to the front of other pkl file, but the generated result is really weird. Thank you for any possible help!

Sorry, I don't really understand the theory either.

P2Oileen commented 2 months ago

@iflamed Anyway thank you so much! I raise an issue in #368, hope this problem can be solved

KwaiVGI / LivePortrait

Video Requirements for Proper Lip Movements #184

Questions: