Open powei-C opened 2 years ago
Hi, normalizing f0 aims to remove the speaker characteristics. During preprocessing phase, f0 is not normalized, but during training and inference, f0 is normalized as shown below: https://github.com/Wendison/VQMIVC/blob/851b4f5ca5bb60c11fea6a618affeb4979b17cf3/dataset.py#L53 https://github.com/Wendison/VQMIVC/blob/851b4f5ca5bb60c11fea6a618affeb4979b17cf3/convert_example.py#L57
Hi, thank you for your explanation!!! I have another question about perplexity when training the model with another dataset. I found that the perplexity didn't keep increasing (have run around 360 epochs in the figure), was it reasonable? And do you have any suggestions to verify this issue?
The perplexity should be increasing during training, as higer perplexity indicates that the vectors in the VQ codebook are distinguiable and can be used to represent different acoustic units. I also saw your recon_loss is high. Based on my experience, recon_loss should be less than 0.5, then you would obtain good converted samples.
Hi, I wonder why you normalize f0 series before feeding to the f0encoder in convert.py. However, this kind of normalization for f0 isn't used in preprocessing phase.