NVIDIA / flowtron

Flowtron is an auto-regressive flow-based generative network for text to speech synthesis with control over speech variation and style transfer
https://nv-adlr.github.io/Flowtron
Apache License 2.0
889 stars 177 forks source link

WARNING:root:NaN or Inf found in input tensor. #54

Open VladC12 opened 4 years ago

VladC12 commented 4 years ago

GPU: 1060 6Gb

`❯ python train.py -c config.json -p train_config.output_directory=outdir train_config.output_directory=outdir output_directory=outdir {'train_config': {'output_directory': 'outdir', 'epochs': 10000000, 'learning_rate': 0.0001, 'weight_decay': 1e-06, 'sigma': 1.0, 'iters_per_checkpoint': 5000, 'batch_size': 1, 'seed': 1234, 'checkpoint_path': '', 'ignore_layers': [], 'include_layers': ['speaker', 'encoder', 'embedding'], 'warmstart_checkpoint_path': '', 'with_tensorboard': True, 'fp16_run': False}, 'data_config': {'training_files': 'filelists/ljs_audiopaths_text_sid_train_filelist.txt', 'validation_files': 'filelists/ljs_audiopaths_text_sid_val_filelist.txt', 'text_cleaners': ['flowtron_cleaners'], 'p_arpabet': 0.5, 'cmudict_path': 'data/cmudict_dictionary', 'sampling_rate': 22050, 'filter_length': 1024, 'hop_length': 256, 'win_length': 1024, 'mel_fmin': 0.0, 'mel_fmax': 8000.0, 'max_wav_value': 32768.0}, 'dist_config': {'dist_backend': 'nccl', 'dist_url': 'tcp://localhost:54321'}, 'model_config': {'n_speakers': 1, 'n_speaker_dim': 128, 'n_text': 185, 'n_text_dim': 512, 'n_flows': 2, 'n_mel_channels': 80, 'n_attn_channels': 640, 'n_hidden': 1024, 'n_lstm_layers': 2, 'mel_encoder_n_hidden': 512, 'n_components': 0, 'mean_scale': 0.0, 'fixed_gaussian': True, 'dummy_speaker_embedding': False, 'use_gate_layer': True}}

got rank 0 and world size 1 ... Flowtron( (speaker_embedding): Embedding(1, 128) (embedding): Embedding(185, 512) (flows): ModuleList( (0): AR_Step( (conv): Conv1d(1024, 160, kernel_size=(1,), stride=(1,)) (lstm): LSTM(1664, 1024, num_layers=2) (attention_lstm): LSTM(80, 1024) (attention_layer): Attention( (softmax): Softmax(dim=2) (query): LinearNorm( (linear_layer): Linear(in_features=1024, out_features=640, bias=False) ) (key): LinearNorm( (linear_layer): Linear(in_features=640, out_features=640, bias=False) ) (value): LinearNorm( (linear_layer): Linear(in_features=640, out_features=640, bias=False) ) (v): LinearNorm( (linear_layer): Linear(in_features=640, out_features=1, bias=False) ) ) (dense_layer): DenseLayer( (layers): ModuleList( (0): LinearNorm( (linear_layer): Linear(in_features=1024, out_features=1024, bias=True) ) (1): LinearNorm( (linear_layer): Linear(in_features=1024, out_features=1024, bias=True) ) ) ) ) (1): AR_Back_Step( (ar_step): AR_Step( (conv): Conv1d(1024, 160, kernel_size=(1,), stride=(1,)) (lstm): LSTM(1664, 1024, num_layers=2) (attention_lstm): LSTM(80, 1024) (attention_layer): Attention( (softmax): Softmax(dim=2) (query): LinearNorm( (linear_layer): Linear(in_features=1024, out_features=640, bias=False) ) (key): LinearNorm( (linear_layer): Linear(in_features=640, out_features=640, bias=False) ) (value): LinearNorm( (linear_layer): Linear(in_features=640, out_features=640, bias=False) ) (v): LinearNorm( (linear_layer): Linear(in_features=640, out_features=1, bias=False) ) ) (dense_layer): DenseLayer( (layers): ModuleList( (0): LinearNorm( (linear_layer): Linear(in_features=1024, out_features=1024, bias=True) ) (1): LinearNorm( (linear_layer): Linear(in_features=1024, out_features=1024, bias=True) ) ) ) (gate_layer): LinearNorm( (linear_layer): Linear(in_features=1664, out_features=1, bias=True) ) ) ) ) (encoder): Encoder( (convolutions): ModuleList( (0): Sequential( (0): ConvNorm( (conv): Conv1d(512, 512, kernel_size=(5,), stride=(1,), padding=(2,)) ) (1): InstanceNorm1d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=False) ) (1): Sequential( (0): ConvNorm( (conv): Conv1d(512, 512, kernel_size=(5,), stride=(1,), padding=(2,)) ) (1): InstanceNorm1d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=False) ) (2): Sequential( (0): ConvNorm( (conv): Conv1d(512, 512, kernel_size=(5,), stride=(1,), padding=(2,)) ) (1): InstanceNorm1d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=False) ) ) (lstm): LSTM(512, 256, batch_first=True, bidirectional=True) ) ) Number of speakers : 1 output directory outdir Epoch: 0 C:\AI_Research_Project\flowtron\data.py:40: UserWarning: The given NumPy array is not writeable, and PyTorch does not support non-writeable tensors. This means you can write to the underlying (supposedly non-writeable) NumPy array using the tensor. You may want to copy the array to protect its data or make it writeable before converting it to a tensor. This type of warning will be suppressed for the rest of this program. (Triggered internally at ..\torch\csrc\utils\tensor_numpy.cpp:141.) return torch.from_numpy(data).float(), sampling_rate C:\AI_Research_Project\flowtron\flowtron.py:373: UserWarning: maskedfill received a mask with dtype torch.uint8, this behavior is now deprecated,please use a mask with dtype torch.bool instead. (Triggered internally at ..\aten\src\ATen\native\cuda\LegacyDefinitions.cpp:19.) self.score_mask_value) 0: nan WARNING:root:NaN or Inf found in input tensor. C:\AI_Research_Project\flowtron\data.py:40: UserWarning: The given NumPy array is not writeable, and PyTorch does not support non-writeable tensors. This means you can write to the underlying (supposedly non-writeable) NumPy array using the tensor. You may want to copy the array to protect its data or make it writeable before converting it to a tensor. This type of warning will be suppressed for the rest of this program. (Triggered internally at ..\torch\csrc\utils\tensor_numpy.cpp:141.) return torch.from_numpy(data).float(), sampling_rate Mean None LogVar None Prob None Validation loss 0: nan WARNING:root:NaN or Inf found in input tensor. Saving model and optimizer state at iteration 0 to outdir/model_0 1: nan WARNING:root:NaN or Inf found in input tensor. 2: nan`

And that error keeps on going EDIT: It just finished doing it's thing

After 319: nan

319: nan WARNING:root:NaN or Inf found in input tensor. Traceback (most recent call last): File "train.py", line 300, in <module> train(n_gpus, rank, **train_config) File "train.py", line 238, in train loss.backward() File "C:\Users\vladc\anaconda3\lib\site-packages\torch\tensor.py", line 185, in backward torch.autograd.backward(self, gradient, retain_graph, create_graph) File "C:\Users\vladc\anaconda3\lib\site-packages\torch\autograd\__init__.py", line 127, in backward allow_unreachable=True) # allow_unreachable flag RuntimeError: transform: failed to synchronize: cudaErrorLaunchFailure: unspecified launch failure

adrianastan commented 4 years ago

12