Closed yygg678 closed 2 years ago
Some points to consider:
Some points to consider:
- How much data for each speaker in your dataset. E.g. 5min is not very enough to learn nn.Embedding from scratch.
- You might need to tune hyperparameter to get balance between the reconstruction loss and adversarial loss.
3 hours of data for each speaker, my reconstruction loss is 5.4.
So in this case I think you need to do some hyperparameter tuning, e.g. decrease the lambda in front of the reconstruction loss, then maybe increase d_lr.
Great work! I train a model in Chinese data, the speaker embedding uses nn.Embedding layer, not use speaker encoder netwotk. In test stage, the similarity of inter-gender is very poor. Any suggestion?