got rank 0 and world size 1 ...
Flowtron(
(speaker_embedding): Embedding(1, 128)
(embedding): Embedding(185, 512)
(flows): ModuleList(
(0): AR_Step(
(conv): Conv1d(1024, 160, kernel_size=(1,), stride=(1,))
(lstm): LSTM(1664, 1024, num_layers=2)
(attention_lstm): LSTM(80, 1024)
(attention_layer): Attention(
(softmax): Softmax(dim=2)
(query): LinearNorm(
(linear_layer): Linear(in_features=1024, out_features=640, bias=False)
)
(key): LinearNorm(
(linear_layer): Linear(in_features=640, out_features=640, bias=False)
)
(value): LinearNorm(
(linear_layer): Linear(in_features=640, out_features=640, bias=False)
)
(v): LinearNorm(
(linear_layer): Linear(in_features=640, out_features=1, bias=False)
)
)
(dense_layer): DenseLayer(
(layers): ModuleList(
(0): LinearNorm(
(linear_layer): Linear(in_features=1024, out_features=1024, bias=True)
)
(1): LinearNorm(
(linear_layer): Linear(in_features=1024, out_features=1024, bias=True)
)
)
)
)
(1): AR_Back_Step(
(ar_step): AR_Step(
(conv): Conv1d(1024, 160, kernel_size=(1,), stride=(1,))
(lstm): LSTM(1664, 1024, num_layers=2)
(attention_lstm): LSTM(80, 1024)
(attention_layer): Attention(
(softmax): Softmax(dim=2)
(query): LinearNorm(
(linear_layer): Linear(in_features=1024, out_features=640, bias=False)
)
(key): LinearNorm(
(linear_layer): Linear(in_features=640, out_features=640, bias=False)
)
(value): LinearNorm(
(linear_layer): Linear(in_features=640, out_features=640, bias=False)
)
(v): LinearNorm(
(linear_layer): Linear(in_features=640, out_features=1, bias=False)
)
)
(dense_layer): DenseLayer(
(layers): ModuleList(
(0): LinearNorm(
(linear_layer): Linear(in_features=1024, out_features=1024, bias=True)
)
(1): LinearNorm(
(linear_layer): Linear(in_features=1024, out_features=1024, bias=True)
)
)
)
(gate_layer): LinearNorm(
(linear_layer): Linear(in_features=1664, out_features=1, bias=True)
)
)
)
)
(encoder): Encoder(
(convolutions): ModuleList(
(0): Sequential(
(0): ConvNorm(
(conv): Conv1d(512, 512, kernel_size=(5,), stride=(1,), padding=(2,))
)
(1): InstanceNorm1d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=False)
)
(1): Sequential(
(0): ConvNorm(
(conv): Conv1d(512, 512, kernel_size=(5,), stride=(1,), padding=(2,))
)
(1): InstanceNorm1d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=False)
)
(2): Sequential(
(0): ConvNorm(
(conv): Conv1d(512, 512, kernel_size=(5,), stride=(1,), padding=(2,))
)
(1): InstanceNorm1d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=False)
)
)
(lstm): LSTM(512, 256, batch_first=True, bidirectional=True)
)
)
Number of speakers : 1
output directory outdir
Epoch: 0
C:\AI_Research_Project\flowtron\data.py:40: UserWarning: The given NumPy array is not writeable, and PyTorch does not support non-writeable tensors. This means you can write to the underlying (supposedly non-writeable) NumPy array using the tensor. You may want to copy the array to protect its data or make it writeable before converting it to a tensor. This type of warning will be suppressed for the rest of this program. (Triggered internally at ..\torch\csrc\utils\tensor_numpy.cpp:141.)
return torch.from_numpy(data).float(), sampling_rate
C:\AI_Research_Project\flowtron\flowtron.py:373: UserWarning: maskedfill received a mask with dtype torch.uint8, this behavior is now deprecated,please use a mask with dtype torch.bool instead. (Triggered internally at ..\aten\src\ATen\native\cuda\LegacyDefinitions.cpp:19.)
self.score_mask_value)
0: nan
WARNING:root:NaN or Inf found in input tensor.
C:\AI_Research_Project\flowtron\data.py:40: UserWarning: The given NumPy array is not writeable, and PyTorch does not support non-writeable tensors. This means you can write to the underlying (supposedly non-writeable) NumPy array using the tensor. You may want to copy the array to protect its data or make it writeable before converting it to a tensor. This type of warning will be suppressed for the rest of this program. (Triggered internally at ..\torch\csrc\utils\tensor_numpy.cpp:141.)
return torch.from_numpy(data).float(), sampling_rate
Mean None
LogVar None
Prob None
Validation loss 0: nan
WARNING:root:NaN or Inf found in input tensor.
Saving model and optimizer state at iteration 0 to outdir/model_0
1: nan
WARNING:root:NaN or Inf found in input tensor.
2: nan`
And that error keeps on going
EDIT: It just finished doing it's thing
After 319: nan
319: nan WARNING:root:NaN or Inf found in input tensor. Traceback (most recent call last): File "train.py", line 300, in <module> train(n_gpus, rank, **train_config) File "train.py", line 238, in train loss.backward() File "C:\Users\vladc\anaconda3\lib\site-packages\torch\tensor.py", line 185, in backward torch.autograd.backward(self, gradient, retain_graph, create_graph) File "C:\Users\vladc\anaconda3\lib\site-packages\torch\autograd\__init__.py", line 127, in backward allow_unreachable=True) # allow_unreachable flag RuntimeError: transform: failed to synchronize: cudaErrorLaunchFailure: unspecified launch failure
GPU: 1060 6Gb
`❯ python train.py -c config.json -p train_config.output_directory=outdir train_config.output_directory=outdir output_directory=outdir {'train_config': {'output_directory': 'outdir', 'epochs': 10000000, 'learning_rate': 0.0001, 'weight_decay': 1e-06, 'sigma': 1.0, 'iters_per_checkpoint': 5000, 'batch_size': 1, 'seed': 1234, 'checkpoint_path': '', 'ignore_layers': [], 'include_layers': ['speaker', 'encoder', 'embedding'], 'warmstart_checkpoint_path': '', 'with_tensorboard': True, 'fp16_run': False}, 'data_config': {'training_files': 'filelists/ljs_audiopaths_text_sid_train_filelist.txt', 'validation_files': 'filelists/ljs_audiopaths_text_sid_val_filelist.txt', 'text_cleaners': ['flowtron_cleaners'], 'p_arpabet': 0.5, 'cmudict_path': 'data/cmudict_dictionary', 'sampling_rate': 22050, 'filter_length': 1024, 'hop_length': 256, 'win_length': 1024, 'mel_fmin': 0.0, 'mel_fmax': 8000.0, 'max_wav_value': 32768.0}, 'dist_config': {'dist_backend': 'nccl', 'dist_url': 'tcp://localhost:54321'}, 'model_config': {'n_speakers': 1, 'n_speaker_dim': 128, 'n_text': 185, 'n_text_dim': 512, 'n_flows': 2, 'n_mel_channels': 80, 'n_attn_channels': 640, 'n_hidden': 1024, 'n_lstm_layers': 2, 'mel_encoder_n_hidden': 512, 'n_components': 0, 'mean_scale': 0.0, 'fixed_gaussian': True, 'dummy_speaker_embedding': False, 'use_gate_layer': True}}
And that error keeps on going EDIT: It just finished doing it's thing
After 319: nan
319: nan WARNING:root:NaN or Inf found in input tensor. Traceback (most recent call last): File "train.py", line 300, in <module> train(n_gpus, rank, **train_config) File "train.py", line 238, in train loss.backward() File "C:\Users\vladc\anaconda3\lib\site-packages\torch\tensor.py", line 185, in backward torch.autograd.backward(self, gradient, retain_graph, create_graph) File "C:\Users\vladc\anaconda3\lib\site-packages\torch\autograd\__init__.py", line 127, in backward allow_unreachable=True) # allow_unreachable flag RuntimeError: transform: failed to synchronize: cudaErrorLaunchFailure: unspecified launch failure