Open oisilener1982 opened 4 months ago
Thank you for your interest! You can check some details about the audio driven control in the supplementary materials of our paper, where we have included the relevant experimental results. @oisilener1982
Is it available right now or if not any estimated date of release? Im having fun with Liveportrait. It is so fast unlike other projects
I scanned the PDF paper and i cant find audio driven control of the face just like in sadtalker or hedra wherein we just input the image and audio then generate a talking avatar
Please build audio driven talking avatar.
I scanned the PDF paper and i cant find audio driven control of the face just like in sadtalker or hedra wherein we just input the image and audio then generate a talking avatar
Could use an audio to 3dmm result as driver or another lipsync tool
I scanned the PDF paper and i cant find audio driven control of the face just like in sadtalker or hedra wherein we just input the image and audio then generate a talking avatar
The experiment results can be found in appendix.C of the paper.
Due to some limitations, we are sorry that we are unable to provide this model. But you can follow the description in the appendix.C to train an audio driven model by yourself :) @nitinmukesh @oisilener1982
i am just an ordinary user :( I only learned something by following the tutorials from youtube (newgenai). I might just subscribe to Hedra and combine it with sadtalker but it would be nice if there would be a talking avatar because this project is really fast. Even faster than sadtalker
Is it just now or there is really no possibility of having a talking head like sadtalker or hedra?
or will this be another project? C. Audio-driven Portrait Animation We can easily extend our video-driven model to audio-driven portrait animation by regressing or generating motions, including expression deformations and head poses, from audio inputs. For instance, we use Whisper [58] to encode audio into sequential features and adopt a transformer-based framework, following FaceFormer [59], to autoregress the motions
I scanned the PDF paper and i cant find audio driven control of the face just like in sadtalker or hedra wherein we just input the image and audio then generate a talking avatar
Could use an audio to 3dmm result as driver or another lipsync tool
Where?
I scanned the PDF paper and i cant find audio driven control of the face just like in sadtalker or hedra wherein we just input the image and audio then generate a talking avatar
Could use an audio to 3dmm result as driver or another lipsync tool
Where?
could use my repo lipsick or dinet might be better for this, or wait for a expressive 3dmm like media2face
I scanned the PDF paper and i cant find audio driven control of the face just like in sadtalker or hedra wherein we just input the image and audio then generate a talking avatar
Could use an audio to 3dmm result as driver or another lipsync tool
Is media2face real time or near real time like live portrait? If so we can build the pipeline much easier
I scanned the PDF paper and i cant find audio driven control of the face just like in sadtalker or hedra wherein we just input the image and audio then generate a talking avatar
Could use an audio to 3dmm result as driver or another lipsync tool
Is media2face real time or near real time like live portrait? If so we can build the pipeline much easier
cant remember that paper was a while ago, the issue is getting a good one with a license that you need it for but generally they are fast
I scanned the PDF paper and i cant find audio driven control of the face just like in sadtalker or hedra wherein we just input the image and audio then generate a talking avatar
Could use an audio to 3dmm result as driver or another lipsync tool
Is media2face real time or near real time like live portrait? If so we can build the pipeline much easier
cant remember that paper was a while ago, the issue is getting a good one with a license that you need it for but generally they are fast
Was codetalker the sota earlier on audio to 3DMM? https://github.com/Doubiiu/CodeTalker we may try that too. Ultimately I’m waiting for something like vasa-1.
I scanned the PDF paper and i cant find audio driven control of the face just like in sadtalker or hedra wherein we just input the image and audio then generate a talking avatar
Could use an audio to 3dmm result as driver or another lipsync tool
Is media2face real time or near real time like live portrait? If so we can build the pipeline much easier
cant remember that paper was a while ago, the issue is getting a good one with a license that you need it for but generally they are fast
Was codetalker the sota earlier on audio to 3DMM? https://github.com/Doubiiu/CodeTalker we may try that too. Ultimately I’m waiting for something like vasa-1.
I've kept a distant eye on 3dmm's and watched the project demo's and starred every-time I found one but it's only from today I am looking at whats available with a good license, although I'm still keeping an eye on emotional lip-sync papers to drive they just don't seem to have a good enough audio to lip fidelity, are you on my Discord inbox Tony? I see you did the replicate for Lipsick we might be doing the same thing here we should talk just in case Discord: Inferencer I have sourced an audio model with multi language support to drive liveportrait but it uses hubert which has a bad license, I don't like deepspeech either, its ok for american male spoken words but not much else https://github.com/user-attachments/assets/70f9ff50-8105-4d29-99c7-62b0b31f46af
I scanned the PDF paper and i cant find audio driven control of the face just like in sadtalker or hedra wherein we just input the image and audio then generate a talking avatar
Could use an audio to 3dmm result as driver or another lipsync tool
Is media2face real time or near real time like live portrait? If so we can build the pipeline much easier
cant remember that paper was a while ago, the issue is getting a good one with a license that you need it for but generally they are fast
Was codetalker the sota earlier on audio to 3DMM? https://github.com/Doubiiu/CodeTalker we may try that too. Ultimately I’m waiting for something like vasa-1.
I've kept a distant eye on 3dmm's and watched the project demo's and starred every-time I found one but it's only from today I am looking at whats available with a good license, although I'm still keeping an eye on emotional lip-sync papers to drive they just don't seem to have a good enough audio to lip fidelity, are you on my Discord inbox Tony? I see you did the replicate for Lipsick we might be doing the same thing here we should talk just in case Discord: Inferencer I have sourced an audio model with multi language support to drive liveportrait but it uses hubert which has a bad license, I don't like deepspeech either, its ok for american male spoken words but not much else https://github.com/user-attachments/assets/70f9ff50-8105-4d29-99c7-62b0b31f46af
Hello @Inferencer , From the audio driver sample you provided, the result is quite good. May I ask what features of the liveportrait model you are using as the prediction target for the audio?
@zzzweakman Hi thanks for amazing work. For audio driven model is the inputs and targets the expression output of the motion encoder and the yaw, pitch and roll angles? do you include the template like in faceformer and if so do you set the template to be the src image or is the template simply the first value in the sequence of expressions / angles? Are you processing the expression tensors in any way eg: scaling them before prediction?
Thank you kindly
https://github.com/user-attachments/assets/70f9ff50-8105-4d29-99c7-62b0b31f46af
This is really amazing. How did you created this, please share with us
@zzzweakman Hello, I have some questions about audio-driven. Can you please give me some advice? From paper content: Unlike pose, we cannot explicitly control the expressions, but rather need a combination of these implicit blendshapes to achieve the desired effects.
so, if i want to train an audio driven model , obviously with audio as input, but what can be used as target? Directly use expressions δ from motion extractor or retargeted offset x -> (x = x + △)?
@zzzweakman Hi thanks for amazing work. For audio driven model is the inputs and targets the expression output of the motion encoder and the yaw, pitch and roll angles? do you include the template like in faceformer and if so do you set the template to be the src image or is the template simply the first value in the sequence of expressions / angles? Are you processing the expression tensors in any way eg: scaling them before prediction?
Thank you kindly
I've tried to auto-regress exp and angles(with and without scale, t). It does not generate normal result. In fact, the driving kpts are totally distorted. I wonder am I missing something here.
Inferencer_is_amazing.mp4
This is really amazing. How did you created this, please share with us
I think he is just using a real video to drive liveportrait, then merge the audio of the real video into the generated one
https://github.com/user-attachments/assets/88af8d95-2610-465e-9fff-016a34029d71
I am trying to put together a few models to achieve this, however quality is NOT GREAT. This is wav2lip after being processed by https://github.com/wangsuzhen/Audio2Head/tree/main
Notice there is no disconnect between the head-neck-region and the shoulders as the video above clearly has, and there is no transparent block around the mouth.
I do not create models (NO GPU POWER) so I throw together a bunch of repos and now trying with LivePortrait
https://github.com/user-attachments/assets/b1951c1e-b4b5-4653-915c-1504470cba6c
Update: Working with this now using LivePortrait, but still far from a GOOD result.
481cab92-0a78-4da0-98f6-7e4f6572d597.mp4 I am trying to put together a few models to achieve this, however quality is NOT GREAT. This is wav2lip after being processed by https://github.com/wangsuzhen/Audio2Head/tree/main
Notice there is no disconnect between the head-neck-region and the shoulders as the video above clearly has, and there is no transparent block around the mouth.
I do not create models (NO GPU POWER) so I throw together a bunch of repos and now trying with LivePortrait
So you mean this is the result of another method but not liveportrait? This is confusing and not suitable that you put it under this issune
481cab92-0a78-4da0-98f6-7e4f6572d597.mp4 I am trying to put together a few models to achieve this, however quality is NOT GREAT. This is wav2lip after being processed by https://github.com/wangsuzhen/Audio2Head/tree/main Notice there is no disconnect between the head-neck-region and the shoulders as the video above clearly has, and there is no transparent block around the mouth. I do not create models (NO GPU POWER) so I throw together a bunch of repos and now trying with LivePortrait
So you mean this is the result of another method but not liveportrait? This is confusing and not suitable that you put it under this issune
Yes, that I am now to incorporate with LivePortrait because it's quality is really good, like the update I showed above, that is run through my first (thrown together method) then LivePortrait (clearly seen with quality)
Inferencer_is_amazing.mp4
This is really amazing. How did you created this, please share with us
I think he is just using a real video to drive liveportrait, then merge the audio of the real video into the generated one
So this too was irrelevant, you could not figure it out as nitinmukesh did
481cab92-0a78-4da0-98f6-7e4f6572d597.mp4 I am trying to put together a few models to achieve this, however quality is NOT GREAT. This is wav2lip after being processed by https://github.com/wangsuzhen/Audio2Head/tree/main Notice there is no disconnect between the head-neck-region and the shoulders as the video above clearly has, and there is no transparent block around the mouth. I do not create models (NO GPU POWER) so I throw together a bunch of repos and now trying with LivePortrait
So you mean this is the result of another method but not liveportrait? This is confusing and not suitable that you put it under this issune
And there you go using Audio2Head then LivePortrait
https://github.com/user-attachments/assets/4abf16a5-cd95-4df0-9fa7-26cf9352a3a0
481cab92-0a78-4da0-98f6-7e4f6572d597.mp4 I am trying to put together a few models to achieve this, however quality is NOT GREAT. This is wav2lip after being processed by https://github.com/wangsuzhen/Audio2Head/tree/main Notice there is no disconnect between the head-neck-region and the shoulders as the video above clearly has, and there is no transparent block around the mouth. I do not create models (NO GPU POWER) so I throw together a bunch of repos and now trying with LivePortrait
So you mean this is the result of another method but not liveportrait? This is confusing and not suitable that you put it under this issune
And there you go using Audio2Head then LivePortrait
piclumen-1726303886409--pre-video.mp4
Hi @ziyaad30 Can you show me how to combine audio2head with liveportrail, how can i make video same with you?
Just wondering if there is any hope of having this project be used to create talking avatar that is audio driven. Im having fun with this proect but it would be nice to have talking heads