Closed kafan1986 closed 1 year ago
Yes, n_speakers
should be matched with the total number of speakers you have.
I haven't tried modifying that max text length so there is nothing I can say about it.
Yes,
n_speakers
should be matched with the total number of speakers you have. I haven't tried modifying that max text length so there is nothing I can say about it.
It is throwing this error.
AttributeError: 'HParams' object has no attribute 'gin_channels'
What should be the ideal value of this gin_channels? So that I can add it to config file.
You can add "gin_channels": 256
to the model section of your configuration file.
{
"model": {
"gin_channels": 256
}
}
I have multispeaker data. I am using train_ms.py script. I have few questions.
a) Do I need to update the "n_speakers" to match the number of distinct speakers in my dataset? b) I am currently running training with "n_speakers" = 0 (default) and the training is around 60K. The generated output seems bad specially in terms of duration of generated audio vs ground truth duration. The generated output duration is 7 seconds whereas ground truth duration is 11 seconds. c) During the training I have updated the config "max_text_len" to 320 from default 190, to allow more segments to pass the threshold as my dataset has some longer utterances/transcript. Can this be an cause for the issue for quality issue that is being noticed?