TMElyralab / MuseTalk

MuseTalk: Real-Time High Quality Lip Synchorization with Latent Space Inpainting
Other
2.72k stars 331 forks source link

Making MuseTalk 40% faster #173

Open mvoodarla opened 2 months ago

mvoodarla commented 2 months ago

I've been pretty impressed with MuseTalk albeit some of its shortcomings and have been playing around with the model. Ended up doing a ton of optimizations that made it run 40% faster. Most of these revolved around how we load, store, and save video frames in memory during pre/post-processing which turns out to be pretty inefficient. To that end, my company Sieve is now hosting it at a rate that's cheaper than self-hosting on GCP!

We also fixed a couple quality issues around audio silences.

We wrote about the work here and would appreciate any feedback / areas of improvement the community has noticed around the model that might be worthwhile for us to check out!

You can also just run the model directly in this playground!

dubeno commented 2 months ago

I saw your blog,very nice jobs!,the prepocess is too long ,the teech low resolution is a big problem, can you show more detail how to solves this cons!

evan-zhao-thermofisher commented 2 months ago

Hi @mvoodarla , your blog is like a guidance towards making the model perfect. Do you mind guiding me how you tackled the hallucination problems from silent audio? just change the temperature or replace with a new whisper model? Appreciate it!

liuzysy commented 2 months ago

Thanks for your work, i just wondering that you have train a new model or use the checkpoint and optimize the inference part? Looking forward your reply.

mvoodarla commented 2 months ago

Hey folks! Thanks for the notes here. We're still doing more active work around this model that we're turning into a high quality pipeline. More specifically, we're doing things like using CodeFormer to upscale, fixing how facial alignment is done, etc.

As per how we tackled hallucination in silent audio, one of the fixes involves first trying to detect the silent audio and then changing input parameters to MuseTalk in those moments to make the mouth shut. We hope to do a more technical post around all of these things soon!

evan-zhao-thermofisher commented 2 months ago

Look forward to it. @mvoodarla , you guys are doing a really meaningful work.

mvoodarla commented 2 months ago

Join our Discord! Happy to share more active updates there.

https://discord.com/invite/Pnh97rvRtD