Closed Augnine closed 4 years ago
You can use the expert disc with good effect only if the expert disc eval loss is about ~0.25
You can use the expert disc with good effect only if the expert disc eval loss is about ~0.25
Thanks for your reply. There is another question about hyperparameters.
you choose: batch size = 64 learning rate = 1e-4, Adam optimizer with default parameters
Is there any other important hyper-parameter that can hace a large effect
Is there any other important hyper-parameter that can have a large effect
Not to our knowledge.
Is there any other important hyper-parameter that can have a large effect
Not to our knowledge.
I will try it again.Thanks
Here is one overlooked part of the code in dataloader when using external datasets. https://github.com/Rudrabha/Wav2Lip/blob/master/color_syncnet_train.py#L77
The img_name and wrong_img_name are chosen randomly. The syncnet paper says that the positive and negative examples are to be within a window of 2 seconds. The network might not learn anything when given something which is completely out of sync.
So, you might want to change that window to be a random choice within 100 frames in either direction.
The syncnet paper says that the positive and negative examples are to be within a window of 2 seconds.
Do you know the reason why this would work better? We do not, as even a randomly sampled segment is off sync. Please let us know if you have some idea.
Intuitively, if the windows was smaller, We choose negative pairs in close proximity, or even with partial overlap more often. These are harder examples to learn. As opposed to random sampling from the entire video, where you would not see such difficult examples as often. It also keeps the remainder of the lower face area similar (In most cases), which means more importance given to the lip area to distinguish positive from negative examples.
We will definitely try this and will add it as a suggestion if it works better. Thanks!
I am training the expert discriminator using my own dataset,but the loss is above 0.69. I am confused whether the model can be used for ‘wav2lip_train'.