Rudrabha / Wav2Lip

This repository contains the codes of "A Lip Sync Expert Is All You Need for Speech to Lip Generation In the Wild", published at ACM Multimedia 2020. For HD commercial model, please try out Sync Labs
https://synclabs.so
9.8k stars 2.13k forks source link

test on my train model ,result is bad #164

Closed keetsky closed 3 years ago

keetsky commented 3 years ago

wav2lip_re0 I trained my own datas, then test it , but result is not good,how to fix it ?

rhulprksh commented 3 years ago

How did you trained your model? Can you tell more specific about your dataset?

keetsky commented 3 years ago

My Dataset is sililar with LRS2 dataset my train code is :: python -u wav2lip_train.py --data_root datas/qihanEnTrans1_vid_wav2lip_preprocessed/qihanEnTrans1_vid --checkpoint_dir checkpoints/wav2lip_v1/ --syncnet_checkpoint_path checkpoints/pretraind/lipsync_expert.pth --checkpoint_path checkpoints/pretraind/wav2lip.pth train results is :: 5_2

prajwalkr commented 3 years ago

Is there lip-sync in your generated result? At both train and test time?

keetsky commented 3 years ago

Is there lip-sync in your generated result? At both train and test time? Yes, that is:: Evaluating for 10 steps L1: 0.01426870274272832, Sync loss: 1.8362067314711483 L1: 0.007411555670525717, Sync Loss: 0.1841694246167722: : 23it [00:47, 2.05s/it]

prajwalkr commented 3 years ago

The model is overfitting to the training data. Also, your eval Sync Loss is pretty high. The expert discriminator's eval loss should go down to ~0.25 and the Wav2Lip eval sync loss should go down to ~0.2 to get good results.

rebotnix commented 3 years ago

One of the reason for the low quality is that the model has just 96px on input resolution. Another feature that has to implement is to replace the rectangle face detection with a segmentation one. So, even when your loss does <0.25, I do not believe that you can increase the quality that much. Hope that helps you and saves you GPU power time.

keetsky commented 3 years ago

Thanks!