Open mrlihellohorld opened 2 years ago
I have countered the same problem....how to decline the cosine loss
You need much more data. Tens and hundreds of videos
Thank you dear,I’ll have a try! Thanks for your attention again🥰
发自我的iPhone
------------------ Original ------------------ From: NikitaKononov @.> Date: Sat,Nov 5,2022 7:52 PM To: Rudrabha/Wav2Lip @.> Cc: YiTingYan @.>, Comment @.> Subject: Re: [Rudrabha/Wav2Lip] train Lip-Sync Expert model have not declined on my custom dataset (Issue #419)
You need much more data. Tens and hundreds of videos
— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.***>
Hi , I am planning to train the model with a different dataset having similar scene in the video provided. I have concerns that the background will effect the accuracy, how was your results? Do you have any recommendations of preprocessing for cropping the video and focusing on the lip area (for movements of the head) like in LRS2 dataset?
Hi , I am planning to train the model with a different dataset having similar scene in the video provided. I have concerns that the background will effect the accuracy, how was your results? Do you have any recommendations of preprocessing for cropping the video and focusing on the lip area (for movements of the head) like in LRS2 dataset?
Training this implementation is hopeless - 96x96 resolution will lead to crappy quality in all cases How much vid samples do you have? To train strong syncnet (288x288 for example) you'll need hundreds of thousands video clips
Thanks for quick response, I am planning to use https://github.com/deeplsd/Merkel-Podcast-Corpus , it has ~28 hours of video however the videos are not cropped for the face itself so I have to modify it by hand.
Thanks for quick response, I am planning to use https://github.com/deeplsd/Merkel-Podcast-Corpus , it has ~28 hours of video however the videos are not cropped for the face itself so I have to modify it by hand.
If you'll train model with single person it won't have generalization ability And will behave poorly on other vids
Such dataset can be good, if you want to train some end2end talking head generation model
Hey, have you solve the problem? I have countered the same problem, I trained on the lrw dataset and the loss has dropped below 0.2 on the training set but stays 0.68-0.69 on the test set, could you give me some advice?
hi, may I ask you some questions about LRS2 dataset.
I want to training on datasets other than LRS2,
how could I do that?
1. what NF and MV mean in test.txt?
6330311066473698535/00011 NF
6330311066473698535/00018 MV
2. Does "Conf" means confidence in 00001.txt?
Text: WHEN YOU'RE COOKING CHIPS AT HOME
Conf: 4
cropped for the face
Do I have to crop the video for the face first?
First I tried the training without the cropping but the loss did not dropped as expected ( around 0.2) because model tries to generate the lower half of the image (see issue #375 for input image). Also as @NikitaKononov mentioned, the resolution of the image reduces so again it's better to crop as much as you can. With the cropped image I was able to see the loss around 0.03.
Thank you dear,I’ll have a try! Thanks for your attention again🥰 发自我的iPhone
Do you have the background problem, when train set background is‘t variety. My trainset background is blue or white, train result is poor, and can't deal with cover, such as finger cover face.
hi, may I ask you some questions about LRS2 dataset.
I want to training on datasets other than LRS2, how could I do that? 1. what NF and MV mean in test.txt? 6330311066473698535/00011 NF 6330311066473698535/00018 MV 2. Does "Conf" means confidence in 00001.txt? Text: WHEN YOU'RE COOKING CHIPS AT HOME Conf: 4
I wanna know the same thing. Did u got to know what these are ?
I'm a begginer, and tryna learn and work with wav2lip model and actually wanna train on my custom dataset which is similar in structure to lrs2 dataset. Can u guys guide me through the procedure what to do as i have looked into readme file, and am a bit confused.
Help would be very much appreciated !
Thank you for open-sourcing such a great project. I read your training method carefully follow https://github.com/Rudrabha/Wav2Lip#training-on-datasets-other-than-lrs2. first, I trained the expert discriminator for my own dataset before training Wav2Lip. but the loss have not declined The total length of my data set is 70 minutes, and it is divided into one-minute videos. The video samples are as follows https://user-images.githubusercontent.com/31719207/192180675-e988ccc8-2ee8-4fd3-9950-9b9584364aee.mp4 could you give me some advice