Closed manmay-nakhashi closed 2 years ago
reference_mels: tensor([[[-2.8908, -2.9343, -2.3924, ..., -4.0000, -4.0000, -4.0000],
[-1.7576, -2.1274, -1.4687, ..., -4.0000, -4.0000, -4.0000],
[-1.5798, -0.8811, 0.6645, ..., -4.0000, -4.0000, -4.0000],
...,
[ 0.0000, 0.0000, 0.0000, ..., 0.0000, 0.0000, 0.0000],
[ 0.0000, 0.0000, 0.0000, ..., 0.0000, 0.0000, 0.0000],
[ 0.0000, 0.0000, 0.0000, ..., 0.0000, 0.0000, 0.0000]],
[[-2.4235, -3.1822, -3.2268, ..., -4.0000, -4.0000, -4.0000],
[-2.8200, -3.1357, -3.4585, ..., -4.0000, -4.0000, -4.0000],
[-2.3820, -3.2600, -3.9348, ..., -3.7277, -3.9225, -4.0000],
...,
[ 0.0000, 0.0000, 0.0000, ..., 0.0000, 0.0000, 0.0000],
[ 0.0000, 0.0000, 0.0000, ..., 0.0000, 0.0000, 0.0000],
[ 0.0000, 0.0000, 0.0000, ..., 0.0000, 0.0000, 0.0000]],
[[-2.9306, -2.6850, -2.3694, ..., -4.0000, -4.0000, -4.0000],
[-1.4642, -1.8053, -1.2172, ..., -4.0000, -4.0000, -4.0000],
[-0.5219, -0.1292, -0.3755, ..., -4.0000, -4.0000, -4.0000],
...,
[ 0.0000, 0.0000, 0.0000, ..., 0.0000, 0.0000, 0.0000],
[ 0.0000, 0.0000, 0.0000, ..., 0.0000, 0.0000, 0.0000],
[ 0.0000, 0.0000, 0.0000, ..., 0.0000, 0.0000, 0.0000]],
...,
[[-1.6988, -0.4720, 0.4284, ..., -4.0000, -4.0000, -4.0000],
[-1.5658, -0.4315, 1.0523, ..., -4.0000, -4.0000, -4.0000],
[-2.2187, -0.9966, 1.1072, ..., -4.0000, -4.0000, -4.0000],
...,
[-0.7557, -1.9654, -1.7806, ..., -4.0000, -4.0000, -4.0000],
[ 0.0000, 0.0000, 0.0000, ..., 0.0000, 0.0000, 0.0000],
[ 0.0000, 0.0000, 0.0000, ..., 0.0000, 0.0000, 0.0000]],
[[-2.3373, -2.8387, -3.2517, ..., -4.0000, -4.0000, -4.0000],
[-2.4404, -2.4244, -3.2637, ..., -4.0000, -4.0000, -4.0000],
[-2.6685, -2.3561, -2.6827, ..., -4.0000, -4.0000, -4.0000],
...,
[ 0.0000, 0.0000, 0.0000, ..., 0.0000, 0.0000, 0.0000],
[ 0.0000, 0.0000, 0.0000, ..., 0.0000, 0.0000, 0.0000],
[ 0.0000, 0.0000, 0.0000, ..., 0.0000, 0.0000, 0.0000]],
[[-3.3388, -3.7791, -3.5639, ..., -4.0000, -4.0000, -4.0000],
[-0.7557, -1.9654, -1.7806, ..., -4.0000, -4.0000, -4.0000],
[ 0.0000, 0.0000, 0.0000, ..., 0.0000, 0.0000, 0.0000],
[ 0.0000, 0.0000, 0.0000, ..., 0.0000, 0.0000, 0.0000]],
[[-2.3373, -2.8387, -3.2517, ..., -4.0000, -4.0000, -4.0000],
[-2.4404, -2.4244, -3.2637, ..., -4.0000, -4.0000, -4.0000],
[-2.6685, -2.3561, -2.6827, ..., -4.0000, -4.0000, -4.0000],
...,
[ 0.0000, 0.0000, 0.0000, ..., 0.0000, 0.0000, 0.0000],
[ 0.0000, 0.0000, 0.0000, ..., 0.0000, 0.0000, 0.0000],
[ 0.0000, 0.0000, 0.0000, ..., 0.0000, 0.0000, 0.0000]],
[[-3.3388, -3.7791, -3.5639, ..., -4.0000, -4.0000, -4.0000],
[-3.5841, -4.0000, -3.9926, ..., -4.0000, -4.0000, -4.0000],
[-4.0000, -4.0000, -4.0000, ..., -3.9709, -4.0000, -4.0000],
...,
[ 0.0000, 0.0000, 0.0000, ..., 0.0000, 0.0000, 0.0000],
[ 0.0000, 0.0000, 0.0000, ..., 0.0000, 0.0000, 0.0000],
[ 0.0000, 0.0000, 0.0000, ..., 0.0000, 0.0000, 0.0000]]],
device='cuda:0')
mel_lengths: tensor([1049, 1050, 1048, 1051, 1052, 1057, 1052, 1052, 1058, 1050, 1050, 1049,
1052, 1056, 1053, 1049, 1046, 1048, 1049, 1051, 1051, 1056, 1055, 1050,
1047, 1046, 1056, 1046, 1056, 1053, 1050, 1056, 1047, 1045, 1049, 1046,
1055, 1055, 1049, 1056, 1050, 1045, 1056, 1052, 1049, 1047, 1049, 1047,
1048, 1048, 1056, 1048, 1050, 1045, 1055, 1054, 1047, 1054, 1052, 1053,
1057, 1044, 1056, 1052, 1053, 1049, 1057, 1049, 1045, 1052, 1056, 1050,
1047, 1048, 1056, 1052, 1045, 1051, 1048, 1047, 1054, 1049, 1050, 1050,
1052, 1046, 1057, 1053, 1057, 1055, 1053, 1051, 1052, 1053, 1056, 1049,
1057, 1046, 1050, 1049, 1051, 1056, 1050, 1052, 1049, 1050, 1052, 1047,
1054, 1051, 1046, 1053, 1057, 1049, 1046, 1055, 1058, 1047, 1056, 1057,
1046, 1051, 1056, 1053, 1045, 1056, 1048, 1048], device='cuda:0')
enc_out1: tensor([[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan],
...,
[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan]], device='cuda:0',
grad_fn=<SelectBackward0>)
enc_out2: tensor([[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan],
...,
[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan]], device='cuda:0',
grad_fn=<CatBackward0>)
enc_out3: tensor([[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan],
...,
[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan]], device='cuda:0',
seems like ReferenceEncoder is throwing nan values in capacitron layers
Update:
enc_out = self.encoder(reference_mels, mel_lengths)
enc_out = torch.nan_to_num(enc_out)
adding nan_to_num resolves the issue for now, still monitoring my training.
If everything is nan, won't nan_to_num just replace everything with zeros? Not sure it will fix the training
@lexkoro it's coming from some of the samples, how do we skip those ?
Remove them from the dataset? ^^
hey @manmay-nakhashi ~ just curious, but have you been able to train Capacitron using the latest coqui (v0.8.0)? And any reason you are using an older version of python (3.7.12)?
no reason, i have been using 3.8, 3.9 with coqui so far no problem.
@WeberJulian can you take a look into that?
Training capacitron is hard since it's pretty unstable. Try using the latest recipe since it improved stability (at least for alignments), you can find it on the latest TTS version.
Describe the bug
raceback (most recent call last): File "/opt/conda/envs/coqui/lib/python3.8/site-packages/trainer-0.0.14-py3.8.egg/trainer/trainer.py", line 1533, in fit self._fit() File "/opt/conda/envs/coqui/lib/python3.8/site-packages/trainer-0.0.14-py3.8.egg/trainer/trainer.py", line 1517, in _fit self.train_epoch() File "/opt/conda/envs/coqui/lib/python3.8/site-packages/trainer-0.0.14-py3.8.egg/trainer/trainer.py", line 1282, in trainepoch , _ = self.train_step(batch, batch_num_steps, cur_step, loader_start_time) File "/opt/conda/envs/coqui/lib/python3.8/site-packages/trainer-0.0.14-py3.8.egg/trainer/trainer.py", line 1114, in train_step outputs, loss_dict_new, step_time = self._optimize( File "/opt/conda/envs/coqui/lib/python3.8/site-packages/trainer-0.0.14-py3.8.egg/trainer/trainer.py", line 998, in _optimize outputs, loss_dict = self._model_train_step(batch, model, criterion) File "/opt/conda/envs/coqui/lib/python3.8/site-packages/trainer-0.0.14-py3.8.egg/trainer/trainer.py", line 954, in _model_train_step return model.train_step(input_args) File "/home/manmay/TTS/TTS/tts/models/tacotron2.py", line 352, in train_step outputs = self.forward(text_input, text_lengths, mel_input, mel_lengths, aux_input) File "/home/manmay/TTS/TTS/tts/models/tacotron2.py", line 216, in forward encoder_outputs, capacitron_vae_outputs = self.compute_capacitron_VAE_embedding( File "/home/manmay/TTS/TTS/tts/models/base_tacotron.py", line 254, in compute_capacitron_VAE_embedding (VAE_outputs, posterior_distribution, prior_distribution, capacitron_beta,) = self.capacitron_vae_layer( File "/opt/conda/envs/coqui/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl return forward_call(*input, **kwargs) File "/home/manmay/TTS/TTS/tts/layers/tacotron/capacitron_layers.py", line 67, in forward self.approximate_posterior_distribution = MVN(mu, torch.diag_embed(sigma)) File "/opt/conda/envs/coqui/lib/python3.8/site-packages/torch/distributions/multivariate_normal.py", line 146, in init super(MultivariateNormal, self).init(batch_shape, event_shape, validate_args=validate_args) File "/opt/conda/envs/coqui/lib/python3.8/site-packages/torch/distributions/distribution.py", line 55, in init raise ValueError( ValueError: Expected parameter loc (Tensor of shape (128, 128)) of distribution MultivariateNormal(loc: torch.Size([128, 128]), covariance_matrix: torch.Size([128, 128, 128])) to satisfy the constraint IndependentConstraint(Real(), 1), but found invalid values: tensor([[nan, nan, nan, ..., nan, nan, nan], [nan, nan, nan, ..., nan, nan, nan], [nan, nan, nan, ..., nan, nan, nan], ..., [nan, nan, nan, ..., nan, nan, nan], [nan, nan, nan, ..., nan, nan, nan], [nan, nan, nan, ..., nan, nan, nan]], grad_fn=)
To Reproduce
config.txt
Expected behavior
No response
Logs
No response
Environment
Additional context
No response