KwaiVGI / LivePortrait

Bring portraits to life!
https://liveportrait.github.io
MIT License
7.33k stars 653 forks source link

Is there a plan to include Talking Avatar that is Audio Driven in LivePortrait? #35

Open oisilener1982 opened 2 weeks ago

oisilener1982 commented 2 weeks ago

Just wondering if there is any hope of having this project be used to create talking avatar that is audio driven. Im having fun with this proect but it would be nice to have talking heads

zzzweakman commented 2 weeks ago

Thank you for your interest! You can check some details about the audio driven control in the supplementary materials of our paper, where we have included the relevant experimental results. @oisilener1982

oisilener1982 commented 2 weeks ago

Is it available right now or if not any estimated date of release? Im having fun with Liveportrait. It is so fast unlike other projects

oisilener1982 commented 2 weeks ago

I scanned the PDF paper and i cant find audio driven control of the face just like in sadtalker or hedra wherein we just input the image and audio then generate a talking avatar

nitinmukesh commented 2 weeks ago

Please build audio driven talking avatar.

Inferencer commented 2 weeks ago

I scanned the PDF paper and i cant find audio driven control of the face just like in sadtalker or hedra wherein we just input the image and audio then generate a talking avatar

Could use an audio to 3dmm result as driver or another lipsync tool

zzzweakman commented 2 weeks ago

I scanned the PDF paper and i cant find audio driven control of the face just like in sadtalker or hedra wherein we just input the image and audio then generate a talking avatar

The experiment results can be found in appendix.C of the paper.

zzzweakman commented 2 weeks ago

Due to some limitations, we are sorry that we are unable to provide this model. But you can follow the description in the appendix.C to train an audio driven model by yourself :) @nitinmukesh @oisilener1982

oisilener1982 commented 2 weeks ago

i am just an ordinary user :( I only learned something by following the tutorials from youtube (newgenai). I might just subscribe to Hedra and combine it with sadtalker but it would be nice if there would be a talking avatar because this project is really fast. Even faster than sadtalker

Is it just now or there is really no possibility of having a talking head like sadtalker or hedra?

or will this be another project? C. Audio-driven Portrait Animation We can easily extend our video-driven model to audio-driven portrait animation by regressing or generating motions, including expression deformations and head poses, from audio inputs. For instance, we use Whisper [58] to encode audio into sequential features and adopt a transformer-based framework, following FaceFormer [59], to autoregress the motions

Bubarinokk commented 1 week ago

I scanned the PDF paper and i cant find audio driven control of the face just like in sadtalker or hedra wherein we just input the image and audio then generate a talking avatar

Could use an audio to 3dmm result as driver or another lipsync tool

Where?

Inferencer commented 1 week ago

I scanned the PDF paper and i cant find audio driven control of the face just like in sadtalker or hedra wherein we just input the image and audio then generate a talking avatar

Could use an audio to 3dmm result as driver or another lipsync tool

Where?

could use my repo lipsick or dinet might be better for this, or wait for a expressive 3dmm like media2face

tonyabracadabra commented 1 week ago

I scanned the PDF paper and i cant find audio driven control of the face just like in sadtalker or hedra wherein we just input the image and audio then generate a talking avatar

Could use an audio to 3dmm result as driver or another lipsync tool

Is media2face real time or near real time like live portrait? If so we can build the pipeline much easier

Inferencer commented 1 week ago

I scanned the PDF paper and i cant find audio driven control of the face just like in sadtalker or hedra wherein we just input the image and audio then generate a talking avatar

Could use an audio to 3dmm result as driver or another lipsync tool

Is media2face real time or near real time like live portrait? If so we can build the pipeline much easier

cant remember that paper was a while ago, the issue is getting a good one with a license that you need it for but generally they are fast

tonyabracadabra commented 1 week ago

I scanned the PDF paper and i cant find audio driven control of the face just like in sadtalker or hedra wherein we just input the image and audio then generate a talking avatar

Could use an audio to 3dmm result as driver or another lipsync tool

Is media2face real time or near real time like live portrait? If so we can build the pipeline much easier

cant remember that paper was a while ago, the issue is getting a good one with a license that you need it for but generally they are fast

Was codetalker the sota earlier on audio to 3DMM? https://github.com/Doubiiu/CodeTalker we may try that too. Ultimately I’m waiting for something like vasa-1.

Inferencer commented 1 week ago

I scanned the PDF paper and i cant find audio driven control of the face just like in sadtalker or hedra wherein we just input the image and audio then generate a talking avatar

Could use an audio to 3dmm result as driver or another lipsync tool

Is media2face real time or near real time like live portrait? If so we can build the pipeline much easier

cant remember that paper was a while ago, the issue is getting a good one with a license that you need it for but generally they are fast

Was codetalker the sota earlier on audio to 3DMM? https://github.com/Doubiiu/CodeTalker we may try that too. Ultimately I’m waiting for something like vasa-1.

I've kept a distant eye on 3dmm's and watched the project demo's and starred every-time I found one but it's only from today I am looking at whats available with a good license, although I'm still keeping an eye on emotional lip-sync papers to drive they just don't seem to have a good enough audio to lip fidelity, are you on my Discord inbox Tony? I see you did the replicate for Lipsick we might be doing the same thing here we should talk just in case Discord: Inferencer I have sourced an audio model with multi language support to drive liveportrait but it uses hubert which has a bad license, I don't like deepspeech either, its ok for american male spoken words but not much else https://github.com/user-attachments/assets/70f9ff50-8105-4d29-99c7-62b0b31f46af