How much time do you need to lip sync a 10 sec or 1 minute video?

AIhasArrived commented 1 year ago

I have been trying the last days with both wav2lip HD (not in auto) and retalker, and found that both are slow and very GPU consuming. I would like to know everyone of you HOW MUCH GPU do you use (what card) and HOW MUCH time does it take for you to do it? What kind of videos/animations are you lip syncing and for how long? (How much time to train X seconds/minutes?)

Please contribute. Because I am about to drop this technology and give up on it, maybe others peoples experiences will give me hope. Maybe this repo is faster? (could not try it yet)

sahreen-haider commented 12 months ago

I have been trying the last days with both wav2lip HD (not in auto) and retalker, and found that both are slow and very GPU consuming. I would like to know everyone of you HOW MUCH GPU do you use (what card) and HOW MUCH time does it take for you to do it? What kind of videos/animations are you lip syncing and for how long? (How much time to train X seconds/minutes?)

Please contribute. Because I am about to drop this technology and give up on it, maybe others peoples experiences will give me hope. Maybe this repo is faster? (could not try it yet)

well I have tried this repository with colab only and it seems fine if you are trying to merge a video of a certain length basically up to 25-30 secs, then anything after that is gonna take a lot of time. Given that free version of colab gives you GPU: Nvidia K80/T4, GPU RAM: 12gb-16gb, TBH that's fine for free version. For your reference lip syncing 1 minute long video would take anywhere from ~ 4 to ~ 5 minutes.

Rather if you want to increase the performance of the model, These are some of the things you can try:

use some alternative face recognition model a lighter one, The one which is used in the model is 'sfd' which is taken from another face detection model, Rather you can use there alternative model which is a faster than "sfd" that is "dlib"https://github.com/1adrianb/face-alignment.git.
checking If your video doesn't have many cuts between frames.
trying with lower resolution videos, since the model itself was trained on videos of resolution 720 p.

davidkundrats commented 12 months ago

I have been running this model on 1080p input videos between 10-30 seconds long on my machine (rtx 3060 12gb vram) and have had to set the --rescale argument for inference.py to 3 to not run out of memory. To generate a lipsync'd clip it takes a little over a minute. I also had to modify the code in order to run this locally on my machine for the preprocessing and discriminator training scripts.

If you want to get this working on your machine I would suggest using environment setup described here: https://github.com/natlamir/Wav2Lip-WebUI

AIhasArrived commented 12 months ago

I have been trying the last days with both wav2lip HD (not in auto) and retalker, and found that both are slow and very GPU consuming. I would like to know everyone of you HOW MUCH GPU do you use (what card) and HOW MUCH time does it take for you to do it? What kind of videos/animations are you lip syncing and for how long? (How much time to train X seconds/minutes?) Please contribute. Because I am about to drop this technology and give up on it, maybe others peoples experiences will give me hope. Maybe this repo is faster? (could not try it yet)

well I have tried this repository with colab only and it seems fine if you are trying to merge a video of a certain length basically up to 25-30 secs, then anything after that is gonna take a lot of time. Given that free version of colab gives you GPU: Nvidia K80/T4, GPU RAM: 12gb-16gb, TBH that's fine for free version. For your reference lip syncing 1 minute long video would take anywhere from ~ 4 to ~ 5 minutes.

Rather if you want to increase the performance of the model, These are some of the things you can try:

use some alternative face recognition model a lighter one, The one which is used in the model is 'sfd' which is taken from another face detection model, Rather you can use there alternative model which is a faster than "sfd" that is "dlib"https://github.com/1adrianb/face-alignment.git.

checking If your video doesn't have many cuts between frames.

trying with lower resolution videos, since the model itself was trained on videos of resolution 720 p.

Thank you , will check it out.

AIhasArrived commented 12 months ago

I have been running this model on 1080p input videos between 10-30 seconds long on my machine (rtx 3060 12gb vram) and have had to set the --rescale argument for inference.py to 3 to not run out of memory. To generate a lipsync'd clip it takes a little over a minute. I also had to modify the code in order to run this locally on my machine for the preprocessing and discriminator training scripts.

If you want to get this working on your machine I would suggest using environment setup described here: https://github.com/natlamir/Wav2Lip-WebUI

Ok thanks will cehck it, might contact you again if needs be.

sahreen-haider commented 11 months ago

I have been running this model on 1080p input videos between 10-30 seconds long on my machine (rtx 3060 12gb vram) and have had to set the --rescale argument for inference.py to 3 to not run out of memory. To generate a lipsync'd clip it takes a little over a minute. I also had to modify the code in order to run this locally on my machine for the preprocessing and discriminator training scripts.

If you want to get this working on your machine I would suggest using environment setup described here: https://github.com/natlamir/Wav2Lip-WebUI

Ok thanks will cehck it, might contact you again if needs be.

Sure

AIhasArrived commented 11 months ago

Hello again @sahreen-haider , but how to change the model used for face recognition? That requires a quite bit of coding no?

sahreen-haider commented 11 months ago

Hello,

The model can be changed with the pertained model for face recognition which is another library, And yes that will require a bit coding.

AIhasArrived commented 11 months ago

Is it possible to get help on that? (maybe send me the modified version by PM if you want it to stay not too much spread, I will only use it myself) I just want a tool that does good lip sync, I have a nice GPU and woudl like to see if I can get some good results, or maybe point me other/better/different tools I could try, It's desperating, I wish I can find the right tool

sahreen-haider commented 11 months ago

Hey @AIhasArrived, I know it could be little difficult to get some good results from the model, since it could require some fine tuning and tampering of parameters, might Also have to change some of the code for the baseline libraries such as face detection and also for GAN (if Advanced towards a more High Definition output).

But I would require some significant time to do this grunt work, unfortunately I might not be able to do this at this time.

But rather you have asked for any alternatives for this, https://www.sievedata.com/functions/sieve/video_retalking the above url was posted by some person, it might be a possible alternate solution for your problem, It is although not Way2Lip, But the issue stated that this alternative could produce much good results as compared to the existing library.

You might want to check it out.

sahreen-haider commented 11 months ago

@AIhasArrived, Connect with me over this email: sahreenhaider@gmail.com

AIhasArrived commented 11 months ago

Already did: sent you an email few days ago titled "Contact from github :)"

Manda69-bit commented 11 months ago

I have been trying the last days with both wav2lip HD (not in auto) and retalker, and found that both are slow and very GPU consuming. I would like to know everyone of you HOW MUCH GPU do you use (what card) and HOW MUCH time does it take for you to do it? What kind of videos/animations are you lip syncing and for how long? (How much time to train X seconds/minutes?)

Please contribute. Because I am about to drop this technology and give up on it, maybe others peoples experiences will give me hope. Maybe this repo is faster? (could not try it yet)

I can sync 8 sec video in like 15s and time could improve with better parameters. But ,when started i had really 4x slower time and i realized something was just wrong , starting chunks were loading really slow. After doing some research i realized problem is new Torch and GPU not working properly. By following other topics i did try with older versions ex " torch==2.0.1+cu118 and my chunk loading speed increased drastically. Hope it helps, and i hope they fix this shit with a new version.

AIhasArrived commented 11 months ago

I have been running this model on 1080p input videos between 10-30 seconds long on my machine (rtx 3060 12gb vram) and have had to set the --rescale argument for inference.py to 3 to not run out of memory. To generate a lipsync'd clip it takes a little over a minute. I also had to modify the code in order to run this locally on my machine for the preprocessing and discriminator training scripts.

If you want to get this working on your machine I would suggest using environment setup described here: https://github.com/natlamir/Wav2Lip-WebUI

Hello @davidkundrats I just tried this repo, it looks nice but when I run it I got into a problme (nothing happening while GPU is being used) did you get that problem yourself? and if yes what did you do to solve it? thanks

Rudrabha / Wav2Lip

How much time do you need to lip sync a 10 sec or 1 minute video? #584