FENRlR / MB-iSTFT-VITS2

Application of MB-iSTFT-VITS components to vits2_pytorch
MIT License
119 stars 29 forks source link

Multispeaker training issue #13

Closed kafan1986 closed 1 year ago

kafan1986 commented 1 year ago

I have multispeaker data. I am using train_ms.py script. I have few questions.

a) Do I need to update the "n_speakers" to match the number of distinct speakers in my dataset? b) I am currently running training with "n_speakers" = 0 (default) and the training is around 60K. The generated output seems bad specially in terms of duration of generated audio vs ground truth duration. The generated output duration is 7 seconds whereas ground truth duration is 11 seconds. c) During the training I have updated the config "max_text_len" to 320 from default 190, to allow more segments to pass the threshold as my dataset has some longer utterances/transcript. Can this be an cause for the issue for quality issue that is being noticed?

FENRlR commented 1 year ago

Yes, n_speakers should be matched with the total number of speakers you have. I haven't tried modifying that max text length so there is nothing I can say about it.

kafan1986 commented 1 year ago

Yes, n_speakers should be matched with the total number of speakers you have. I haven't tried modifying that max text length so there is nothing I can say about it.

It is throwing this error.

AttributeError: 'HParams' object has no attribute 'gin_channels'

What should be the ideal value of this gin_channels? So that I can add it to config file.

FENRlR commented 1 year ago

You can add "gin_channels": 256 to the model section of your configuration file.

{
    "model": {
      "gin_channels": 256
    }
  }