Open danablend opened 2 months ago
depends on the method if you need to drive image or video, as driving an img you will need head motion so faceformer wouldn't be good, however driving a video would be ideal to have a static audio driven animation like faceformer,
I'm moving out of the A.I CV field so wont be much help as I am now a month behind on my daily research but a quick tip for finding sota or recent opensource is to search arvix click search all fields then search for "lip" then if that's all been checked to search "avatar" a quick search just now came up for 20 I haven't looked at such as UniTalker which has an excellent parameter based audio23dmm You will normally see a link to a project page or github in the description of the arvix by clicking the "more" button at the end the of description and sometimes you will need to open the pdf and ctrl + F "git" to see if they have a link hidden in the paper.
we are also looking for a similar feature @danablend . by any luck were you able to achieve audio driven portrait? would love to see your code and help you to build it or finetune. do let me know
could you show some of the better videos, as well as the generation speed(by using lipsick)?
same here, waiting for it @danablend
@cleardusk Thanks for your teams' excellent work!Could you tell me how to control the lip features, sometimes a 0 opening and closing ratio can make the lips look weird. Do you have a good way to solve the problem that the lips can be properly still in vid2vid situation when the driving video is not moving?
I'm sure I'm not the only one who would love to use this for audio-driven video editing, particularly for lip syncing.
At the moment, I have successfully gotten it to work by chaining together @Inferencer's LipSick library with LivePortrait, and results are decent.
For LivePortrait motion we have either two options:
I have opted for relative motion, because the absolute motion introduces too much video stuttering in my experience.
However, the relative motion lip movement is not pronounced enough for me to get great results (although results are decent). This could be because LipSick's lip syncing movements are not pronounced enough for LivePortrait's relative motion to perform at its best, or we might be able to modify the LivePortrait code slightly to increase the weight of the relative motion differences, to make it a little more similar to how absolute motion would be, while avoiding the video stuttering.
I'm currently playing with code in this area, trying to dial in the lip movement - I'm by no means an expert here, so it's mostly trial and error:
Alternatively to using LipSick for generating the video to drive the LivePortrait model, the authors of LivePortrait used FaceFormer together with Whisper for audio driven results. Might be worth a shot for highly expressive and fast inference times?
The most successful LivePortrait configuration I have found so far for lip syncing are the following settings:
Wanting to start this thread so we can pitch in together and get audio driven editing to work very well!