coqui-ai / TTS

🐸💬 - a deep learning toolkit for Text-to-Speech, battle-tested in research and production
http://coqui.ai
Mozilla Public License 2.0
34.99k stars 4.27k forks source link

[Bug] Error in training Capacitron #1832

Closed arif334 closed 2 years ago

arif334 commented 2 years ago

Describe the bug

I was training a Capacitron model with my own dataset (bn-BD, 12-hour). Training started successfully, but it stopped after 27 epochs (around 5500 steps) with the following error message:

...
ValueError: Expected parameter loc (Tensor of shape (48, 128)) of distribution MultivariateNormal(loc: torch.Size([48, 128]), covariance_matrix: torch.Size([48, 128, 128])) to satisfy the constraint IndependentConstraint(Real(), 1), but found invalid values:
tensor([[nan, nan, nan,  ..., nan, nan, nan],
        [nan, nan, nan,  ..., nan, nan, nan],
        [nan, nan, nan,  ..., nan, nan, nan],
        ...,
        [nan, nan, nan,  ..., nan, nan, nan],
        [nan, nan, nan,  ..., nan, nan, nan],
        [nan, nan, nan,  ..., nan, nan, nan]], grad_fn=<ExpandBackward0>)

I believe it's a PyTorch issue. Can someone guide me solving this problem?

To Reproduce

I was doing this experiment in colab. Here's the notebook: link

Here's the config.json file.

Expected behavior

No response

Logs

Traceback (most recent call last):
  File "/usr/local/lib/python3.7/dist-packages/trainer/trainer.py", line 1534, in fit
    self._fit()
  File "/usr/local/lib/python3.7/dist-packages/trainer/trainer.py", line 1518, in _fit
    self.train_epoch()
  File "/usr/local/lib/python3.7/dist-packages/trainer/trainer.py", line 1283, in train_epoch
    _, _ = self.train_step(batch, batch_num_steps, cur_step, loader_start_time)
  File "/usr/local/lib/python3.7/dist-packages/trainer/trainer.py", line 1124, in train_step
    num_optimizers=1,
  File "/usr/local/lib/python3.7/dist-packages/trainer/trainer.py", line 998, in _optimize
    outputs, loss_dict = self._model_train_step(batch, model, criterion)
  File "/usr/local/lib/python3.7/dist-packages/trainer/trainer.py", line 954, in _model_train_step
    return model.train_step(*input_args)
  File "/usr/local/lib/python3.7/dist-packages/TTS/tts/models/tacotron2.py", line 327, in train_step
    outputs = self.forward(text_input, text_lengths, mel_input, mel_lengths, aux_input)
  File "/usr/local/lib/python3.7/dist-packages/TTS/tts/models/tacotron2.py", line 198, in forward
    speaker_embedding=embedded_speakers if self.capacitron_vae.capacitron_use_speaker_embedding else None,
  File "/usr/local/lib/python3.7/dist-packages/TTS/tts/models/base_tacotron.py", line 257, in compute_capacitron_VAE_embedding
    speaker_embedding,  # pylint: disable=not-callable
  File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/usr/local/lib/python3.7/dist-packages/TTS/tts/layers/tacotron/capacitron_layers.py", line 66, in forward
    self.approximate_posterior_distribution = MVN(mu, torch.diag_embed(sigma))
  File "/usr/local/lib/python3.7/dist-packages/torch/distributions/multivariate_normal.py", line 146, in __init__
    super(MultivariateNormal, self).__init__(batch_shape, event_shape, validate_args=validate_args)
  File "/usr/local/lib/python3.7/dist-packages/torch/distributions/distribution.py", line 56, in __init__
    f"Expected parameter {param} "
ValueError: Expected parameter loc (Tensor of shape (48, 128)) of distribution MultivariateNormal(loc: torch.Size([48, 128]), covariance_matrix: torch.Size([48, 128, 128])) to satisfy the constraint IndependentConstraint(Real(), 1), but found invalid values:
tensor([[nan, nan, nan,  ..., nan, nan, nan],
        [nan, nan, nan,  ..., nan, nan, nan],
        [nan, nan, nan,  ..., nan, nan, nan],
        ...,
        [nan, nan, nan,  ..., nan, nan, nan],
        [nan, nan, nan,  ..., nan, nan, nan],
        [nan, nan, nan,  ..., nan, nan, nan]], grad_fn=<ExpandBackward0>)

Environment

{
    "CUDA": {
        "GPU": [
            "Tesla T4"
        ],
        "available": true,
        "version": "11.3"
    },
    "Packages": {
        "PyTorch_debug": false,
        "PyTorch_version": "1.12.0+cu113",
        "TTS": "0.7.1",
        "numpy": "1.21.6"
    },
    "System": {
        "OS": "Linux",
        "architecture": [
            "64bit",
            ""
        ],
        "processor": "x86_64",
        "python": "3.7.13",
        "version": "#1 SMP Sun Apr 24 10:03:06 PDT 2022"
    }
}

Additional context

No response

erogol commented 2 years ago

@WeberJulian

a-froghyar commented 2 years ago

@arif334 I've had this issue before and what I would suggest is increasing min_audio_len. I had NaN issues in the posterior as well before when too short samples were fed to the model. Try increasing it to at least 1s (22050 in your case). Let me know if it helped!

WeberJulian commented 2 years ago

Posting the answer I made on the coqui-ai/TTS channel last friday.

I don't think that's the issue, I also got it in regular training I guess it's just part of Capacitron instabilities try just continuing the run with slightly different dataset parameters

arif334 commented 2 years ago

@arif334 I've had this issue before and what I would suggest is increasing min_audio_len. I had NaN issues in the posterior as well before when too short samples were fed to the model. Try increasing it to at least 1s (22050 in your case). Let me know if it helped!

Okay, thanks. I'll report the update.

arif334 commented 2 years ago

Update: The error returned after 125 epochs. My samples are between 1 and 10 seconds. And the model didn't seem to be learning well, all the loss curves are upward! @WeberJulian @a-froghyar

a-froghyar commented 2 years ago

are you using phonemes? use_phonemes is False in your config

a-froghyar commented 2 years ago

I'd also try to reduce max_audio_len to 6 seconds

arif334 commented 2 years ago

are you using phonemes? use_phonemes is False in your config

Unfortunately, no. My language is not supported in gruut, and espeak performs poorly.

I'd also try to reduce max_audio_len to 6 seconds

Should I reduce my max_audio_len as well? That would also reduce the duration of the dataset (probably <10 hr).

WeberJulian commented 2 years ago

Then the task might be too hard for Capacitron without a phonemizer

arif334 commented 2 years ago

Then the task might be too hard for Capacitron without a phonemizer

That was my assumption as well. So I'm going to postpone my Capacitron training for now. I'm working on developing my phonemizer. Hopefully, I'll come back when the phonemizer is ready.

manmay-nakhashi commented 2 years ago

facing same issue

manmay-nakhashi commented 2 years ago

config.txt

manmay-nakhashi commented 2 years ago
Traceback (most recent call last):
  File "/opt/conda/envs/coqui/lib/python3.8/site-packages/trainer-0.0.14-py3.8.egg/trainer/trainer.py", line 1533, in fit
    self._fit()
  File "/opt/conda/envs/coqui/lib/python3.8/site-packages/trainer-0.0.14-py3.8.egg/trainer/trainer.py", line 1517, in _fit
    self.train_epoch()
  File "/opt/conda/envs/coqui/lib/python3.8/site-packages/trainer-0.0.14-py3.8.egg/trainer/trainer.py", line 1282, in train_epoch
    _, _ = self.train_step(batch, batch_num_steps, cur_step, loader_start_time)
  File "/opt/conda/envs/coqui/lib/python3.8/site-packages/trainer-0.0.14-py3.8.egg/trainer/trainer.py", line 1114, in train_step
    outputs, loss_dict_new, step_time = self._optimize(
  File "/opt/conda/envs/coqui/lib/python3.8/site-packages/trainer-0.0.14-py3.8.egg/trainer/trainer.py", line 998, in _optimize
    outputs, loss_dict = self._model_train_step(batch, model, criterion)
  File "/opt/conda/envs/coqui/lib/python3.8/site-packages/trainer-0.0.14-py3.8.egg/trainer/trainer.py", line 954, in _model_train_step
    return model.train_step(*input_args)
  File "/home/manmay/TTS/TTS/tts/models/tacotron2.py", line 352, in train_step
    outputs = self.forward(text_input, text_lengths, mel_input, mel_lengths, aux_input)
  File "/home/manmay/TTS/TTS/tts/models/tacotron2.py", line 216, in forward
    encoder_outputs, *capacitron_vae_outputs = self.compute_capacitron_VAE_embedding(
  File "/home/manmay/TTS/TTS/tts/models/base_tacotron.py", line 254, in compute_capacitron_VAE_embedding
    (VAE_outputs, posterior_distribution, prior_distribution, capacitron_beta,) = self.capacitron_vae_layer(
  File "/opt/conda/envs/coqui/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/manmay/TTS/TTS/tts/layers/tacotron/capacitron_layers.py", line 67, in forward
    self.approximate_posterior_distribution = MVN(mu, torch.diag_embed(sigma))
  File "/opt/conda/envs/coqui/lib/python3.8/site-packages/torch/distributions/multivariate_normal.py", line 146, in __init__
    super(MultivariateNormal, self).__init__(batch_shape, event_shape, validate_args=validate_args)
  File "/opt/conda/envs/coqui/lib/python3.8/site-packages/torch/distributions/distribution.py", line 55, in __init__
    raise ValueError(
ValueError: Expected parameter loc (Tensor of shape (128, 128)) of distribution MultivariateNormal(loc: torch.Size([128, 128]), covariance_matrix: torch.Size([128, 128, 128])) to satisfy the constraint IndependentConstraint(Real(), 1), but found invalid values:
tensor([[nan, nan, nan,  ..., nan, nan, nan],
        [nan, nan, nan,  ..., nan, nan, nan],
        [nan, nan, nan,  ..., nan, nan, nan],
        ...,
        [nan, nan, nan,  ..., nan, nan, nan],
        [nan, nan, nan,  ..., nan, nan, nan],
        [nan, nan, nan,  ..., nan, nan, nan]], grad_fn=<ExpandBackward0>)