Closed yhgon closed 6 years ago
detail error infomation Case1. input dataset would be 22K sampling wav. config.config option for 16K sampling as below error :
Traceback (most recent call last):
File "train.py", line 197, in <module>
train(num_gpus, args.rank, args.group_name, **train_config)
File "train.py", line 132, in train
for i, batch in enumerate(train_loader):
File "/opt/conda/envs/pytorch-py3.6/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 281, in __next__
return self._process_next_batch(batch)
File "/opt/conda/envs/pytorch-py3.6/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 301, in _process_next_batch
raise batch.exc_type(batch.exc_msg)
IndexError: Traceback (most recent call last):
File "/opt/conda/envs/pytorch-py3.6/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 55, in _worker_loop
samples = collate_fn([dataset[i] for i in batch_indices])
File "/opt/conda/envs/pytorch-py3.6/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 55, in <listcomp>
samples = collate_fn([dataset[i] for i in batch_indices])
File "/scratch/github/NVIDIA/nv-wavenet/pytorch/mel2samp_onehot.py", line 79, in __getitem__
sampling_rate, self.sampling_rate))
IndexError: tuple index out of range
output directory checkpoints-2018-0601-lj
Epoch: 0
Case2. input dataset would be 22K sampling wav. config.json option for 22K sampling as below error
output directory checkpoints-2018-0601-lj
Epoch: 0
Traceback (most recent call last):
File "train.py", line 197, in <module>
train(num_gpus, args.rank, args.group_name, **train_config)
File "train.py", line 140, in train
y_pred = model(x)
File "/opt/conda/envs/pytorch-py3.6/lib/python3.6/site-packages/torch/nn/modules/module.py", line 357, in __call__
result = self.forward(*input, **kwargs)
File "/scratch/github/NVIDIA/nv-wavenet/pytorch/wavenet.py", line 107, in forward
assert(cond_input.size(2) >= forward_input.size(1))
AssertionError
@yhgon the tacotron2 repo supports any sampling rate. just update the sampling_rate and other parameters accordingly to match your wavenet https://github.com/NVIDIA/tacotron2/blob/master/hparams.py
@rafaelvalle Current pytorch nv-wavenet implementation don't support Param2 case and supported below Param1 case as you mentioned . I worry about the quality of voice. I think below Param2 case would be best quality but current pytorch nv-wavenet implementation could not support below configuration as I mentioned in Case 2 error . What is your recommendation for training configuration for both of tacotron2 + nv-wavnet in down-sampling case? Anyway, What is your opinion about loosing quality Param1 and Param3 case?
down sampling both of training
| Param1 | tacotron2 | NV-wavenet |
| sampling_rate | 16K | 16K |
| segment_length | 16K | 16K |
| filter_length | 800 | 800 |
| hop_length | 200 | 200 |
| win_length | 800 | 800 |
| win_length | 800 | 800 |
| mel_channels |80 | 80 |
matching all parameters
| Param2 | tacotron2 | NV-wavenet |
| sampling_rate | 22K | 22K |
| segment_length | 22K | 22K |
| filter_length | 1024 | 1024 |
| hop_length | 256 | 256 |
| win_length | 1024 | 1024 |
| win_length | 1024 | 1024 |
| mel_channels |80 | 80 |
down sampling during inferencing
: Param3 : tacotron2 : NV-wavenet :
| sampling_rate | 22K | 16K |
| segment_length | 22K | 16K |
| filter_length | 1024 | 800 |
| hop_length | 256 | 200 |
| win_length | 1024 | 800 |
| win_length | 1024 | 800 |
| mel_channels |80 | 80 |
one should match all parameters. nv-wavenet parameters can be set on this file: https://github.com/NVIDIA/nv-wavenet/blob/master/pytorch/config.json The most relevant params are: stride, win_length, sampling_rate, upsamp_window, upsamp_stride
@yhgon When you train with LJSpeech , nv-wavenet requires that assert(cond_input.size(2) >= forward_input.size(1))
,which means nv-wavenet condition upsample outputs's length have to be >=
segment_length(T). Suppose condition size is(k, condition_channel)
,win_length is kernel_size, hop_length is the stride_size, so you have to satisfy this (k-1)*kernel_size + stride_size>=segment_length
no padding, you can check torch.nn.ConvTranspose1d
@rafaelvalle @zhf459 thanks for your comment. I found the reason why I have problem. when I set up sampling_rate 22050 instead of 22000, it works well.
{
"train_config": {
"output_directory": "checkpoints-2018-0604-lj-22k",
"epochs": 100000,
"learning_rate": 1e-3,
"iters_per_checkpoint": 1000,
"batch_size": 12,
"seed": 1234,
"checkpoint_path": ""
},
"data_config": {
"training_files": "lj_train_files.txt",
"segment_length": 22050,
"mu_quantization": 256,
"filter_length": 1024,
"hop_length": 256,
"win_length": 1024,
"sampling_rate": 22050
},
"dist_config": {
"dist_backend": "nccl",
"dist_url": "tcp://localhost:54321"
},
"wavenet_config": {
"n_in_channels": 256,
"n_layers": 16,
"max_dilation": 128,
"n_residual_channels": 64,
"n_skip_channels": 256,
"n_out_channels": 256,
"n_cond_channels": 80,
"upsamp_window": 1024,
"upsamp_stride": 256
@yhgon please close the issue if it is resolved.
Resolved with right config.
for tacotron training with LJ Dataset, we use 22K sampling and nv-wavenet pytorch implementation only support 16k sampling