Rudrabha / Wav2Lip

This repository contains the codes of "A Lip Sync Expert Is All You Need for Speech to Lip Generation In the Wild", published at ACM Multimedia 2020. For HD commercial model, please try out Sync Labs
https://synclabs.so
10.18k stars 2.19k forks source link

Training with own dataset #66

Closed Crazyjoedevola closed 3 years ago

Crazyjoedevola commented 3 years ago

Hi

What should the train.txt and val.txt contain?

I have preprocessed a mp4 file which generates the individual images into /lrs2_preprocessed/1/00001

When I run the following command:' !python color_syncnet_train.py --data_root /lrs2_preprocessed --checkpoint_dir /content/Wav2Lip/checkpoints

I get the following error if i put empty txt files in the folder ValueError: num_samples should be a positive integer value, but got num_samples=0

If the text files are not present i get this FileNotFoundError: [Errno 2] No such file or directory: 'filelists/train.txt'

prajwalkr commented 3 years ago

the filelist should be a list of video filenames without the extension.

In your case, 1/00001 would be one line. Similarly other lines for more videos.

Crazyjoedevola commented 3 years ago

Thanks

Should both of them contain the same values?

prajwalkr commented 3 years ago

No....one should contain training set videos, the other the validation set videos.

Crazyjoedevola commented 3 years ago

hmm. I will probably figure it out :) So

  1. I have one video which i want to train from (in order to better simulate that persons lip movement) (right)? This video I preprocessed into individual files.
  2. I suppose the video doesnt have to contain audio for this purpose?
  3. How long does the validation video(s) need to be and do i have to pre-process them in the same way?
Crazyjoedevola commented 3 years ago

I am getting to this point (sorry - i am a REAL NOOB when it comes to this)

Starting Epoch: 0 0it [00:02, ?it/s] Traceback (most recent call last): File "hq_wav2lip_train.py", line 443, in nepochs=hparams.nepochs) File "hq_wav2lip_train.py", line 262, in train save_sample_images(x, g, gt, global_step, checkpoint_dir) File "hq_wav2lip_train.py", line 175, in save_sample_images if not os.path.exists(folder): os.mkdir(folder) NotADirectoryError: [Errno 20] Not a directory: '/content/Wav2Lip/models/wav2lip_gan.pth/samples_step000000000'

prajwalkr commented 3 years ago

Please specify this as your checkpoint_dir: /content/Wav2Lip/models/

Crazyjoedevola commented 3 years ago

So i think i got it started now. By the way, really cool stuff and I really want to try it, so appreciate you taking the time to answer

I run this, using my own dataset (which is very small to start with, in order to test it). Is this the correct syntax to train for a custom person?

!python hq_wav2lip_train.py --data_root lrs2_preprocessed/ --checkpoint_dir /content/Wav2Lip/models --syncnet_checkpoint_path /content/Wav2Lip/models/lipsync_expert.pth So far so good. I get something like this as output:

L1: 0.16577447950839996, Sync: 0.0, Percep: 0.6806370615959167 | Fake: 0.7058212161064148, Real: 0.6555213928222656: : 1it [00:02,  3.00s/it]
Starting Epoch: 38

What does the different paremeters mean? L1: Sync: Percep: Fake: Real:

For how may epochs should i train approximately?

Crazyjoedevola commented 3 years ago

I am also curious to know what the different models do

/content/Wav2Lip/models/checkpoint_step000000001.pth

/content/Wav2Lip/models/disc_checkpoint_step000000001.pth

Are any of these the ones i should use when running inference.py?

Crazyjoedevola commented 3 years ago

At around 3000 my values are starting to look odd

L1: 0.025442134588956833, Sync: 0.0, Percep: 27.63102149963379 | Fake: 0.0, Real: 27.63102149963379: : 1it [00:03, 3.38s/it] Starting Epoch: 3003

Can you please help me explain this, or is there something wrong with my training?

prajwalkr commented 3 years ago

What does the different paremeters mean? L1: Sync: Percep: Fake: Real:

I am also curious to know what the different models do

Please check the paper linked in the README.

is there something wrong with my training?

Yes, we cannot really help you if you are training on a single video. You would need lots of training data, such as the LRS2 dataset.

Crazyjoedevola commented 3 years ago

Ok, i can collect many more videos, but what would you say is the least amount in terms of numbers of videos and length?

prajwalkr commented 3 years ago

The released models are trained on 29 hours of video data containing several thousands of identities.

Crazyjoedevola commented 3 years ago

Hi

This is not really what i am thinking about. I thought it would be possible to fine-tune the model with own data that would be much less than the originally trained video. The intention of this would be to achieve better results for the intended "target actor", but maybe that is not possible?

prajwalkr commented 3 years ago

I thought it would be possible to fine-tune the model with own data that would be much less than the originally trained video.

Yes, that is a good idea. It can work, but we have not tried it. So, we are not sure how much data is required, or will plain fine-tuning work, etc.

Crazyjoedevola commented 3 years ago

Thanks for the response. I will close this for now. Great project by the way

leijue222 commented 3 years ago

@prajwalkr How long have you trained on the LRS2 dataset? And the number and version of GPUs you used.