MingjieChen / DYGANVC

demo page https://MingjieChen.github.io/dygan-vc
67 stars 9 forks source link

train Chinese data #8

Closed yygg678 closed 2 years ago

yygg678 commented 2 years ago

Great work! I train a model in Chinese data, the speaker embedding uses nn.Embedding layer, not use speaker encoder netwotk. In test stage, the similarity of inter-gender is very poor. Any suggestion?

MingjieChen commented 2 years ago

Some points to consider:

  1. How much data for each speaker in your dataset. E.g. 5min is not very enough to learn nn.Embedding from scratch.
  2. You might need to tune hyperparameter to get balance between the reconstruction loss and adversarial loss.
yygg678 commented 2 years ago

Some points to consider:

  1. How much data for each speaker in your dataset. E.g. 5min is not very enough to learn nn.Embedding from scratch.
  2. You might need to tune hyperparameter to get balance between the reconstruction loss and adversarial loss.

3 hours of data for each speaker, my reconstruction loss is 5.4.

MingjieChen commented 2 years ago

So in this case I think you need to do some hyperparameter tuning, e.g. decrease the lambda in front of the reconstruction loss, then maybe increase d_lr.