Closed raymond00000 closed 4 years ago
I have created a branch in my repo that seems to fix this https://github.com/pneumoman/mellotron/tree/Fix_Short_Input_Lengths
I have created a branch in my repo that seems to fix this https://github.com/pneumoman/mellotron/tree/Fix_Short_Input_Lengths
Thanks a lot for help! I will try testing your fix.
But may I know what is the problem? is it becoz the reference encoder requires MEL minimum length of 65? given 1 mel = 12.5ms, if audio length less than 65 x 12.5ms = 0.8s, then this will trigger short input length error?
btw, should this line be:
torch.LongTensor([len(x[0]) for x in batch]),
x[0] -> x[1]?
torch.LongTensor([len(x[1]) for x in batch]),
where x[0] is the length of text? x[1] is the length of MEL?
Thanks a lot.
They are using input_lengths which I believe is the length of the encoded text. I don't really understand the change that was made in modules.py, but in implementing, it there is a scaling that occurs where the input_length is divided by two raised to the power of the number of convolutions in the ReferenceEncoder. Then it is rounded, and in doing so, some samples become zero. This call nn.utils.rnn.pack_padded_sequence needs the length greater than zero, causing the error. I don't know but a simpler fix might be that instead of rounding down you round up.
Also, I made the limit 65 instead of 64 to ensure the included had some length - I have a feeling that there might be a need to make the number even larger, but as I said I don't really know what this change is achieving
I did not change that line, it's the same as what's in master line number 120?
yes, i am sorry, I made mistake, you did not change that line, it is same as in master.
you are right: "input_lengths which I believe is the length of the encoded text".
Thanks a lot for explaining the root case of the error due to "some samples become zero.." in the code.
"I don't really know what this change is achieving", me too, I am still reading the code. I will update if I get new understanding.
thx so much!
@ustraymond Hey wondering if you've seen this (i'm running a very bastardized version of mellotron so not sure it's my fault) but I'm seeing an error happening in Tacotron2.parse_output.
Traceback (most recent call last): File "train.py", line 325, in
args.warm_start, args.n_gpus, args.rank, args.group_name, hparams) File "train.py", line 220, in train y_pred = model(x) File "/opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py", line 532, in call result = self.forward(*input, **kwargs) File "/workspace/mellotron/model.py", line 675, in forward output_lengths) File "/workspace/mellotron/model.py", line 646, in parse_output outputs[0].data.maskedfill(mask, 0.0) RuntimeError: The expanded size of the tensor (517) must match the existing size (460) at non-singleton dimension 2. Target sizes: [4, 80, 517]. Tensor sizes: [4, 80, 460]
curious if you hit this too.
Note the line numbers are probably wrong for you
I tested your code fix:
I pushed the latest master and patched your change in data_ulti.py. used the libritts filelist.
It worked, it ran 3000 more steps without the length error.
In another words, I am sorry. I did not hit the problem you mentioned.
They are using input_lengths which I believe is the length of the encoded text. I don't really understand the change that was made in modules.py, but in implementing, it there is a scaling that occurs where the input_length is divided by two raised to the power of the number of convolutions in the ReferenceEncoder. Then it is rounded, and in doing so, some samples become zero. This call nn.utils.rnn.pack_padded_sequence needs the length greater than zero, causing the error. I don't know but a simpler fix might be that instead of rounding down you round up.
Also, I made the limit 65 instead of 64 to ensure the included had some length - I have a feeling that there might be a need to make the number even larger, but as I said I don't really know what this change is achieving
I did not change that line, it's the same as what's in master line number 120?
@pneumoman hello I hit this error too How did you fix that error? thanks I guess the reason is in each batch the batch size is not the same...?
@hongyuntw did you checkout the branch I referenced above?
@ustraymond Hey wondering if you've seen this (i'm running a very bastardized version of mellotron so not sure it's my fault) but I'm seeing an error happening in Tacotron2.parse_output.
Traceback (most recent call last): File "train.py", line 325, in args.warm_start, args.n_gpus, args.rank, args.group_name, hparams) File "train.py", line 220, in train y_pred = model(x) File "/opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py", line 532, in call result = self.forward(*input, **kwargs) File "/workspace/mellotron/model.py", line 675, in forward output_lengths) File "/workspace/mellotron/model.py", line 646, in parse_output outputs[0].data.maskedfill(mask, 0.0) RuntimeError: The expanded size of the tensor (517) must match the existing size (460) at non-singleton dimension 2. Target sizes: [4, 80, 517]. Tensor sizes: [4, 80, 460]
curious if you hit this too. Note the line numbers are probably wrong for you
Hi I have met this error while I using Blizzard2013 dataset, have you fixed this? Another question is that, while I reading the paper, the author filtered out all the audio that is longer than 10s, could this be the reason of the above problem?
Hi, I tried to repeat the model training on libritts data.
I downloaded the libritts data set.
I updated the path in filelist.txt.
I changed the hparams.py.
then i started the training. but i got below weird error.
RuntimeError: Length of all samples has to be greater than 0, but found an element in 'lengths' that is <= 0
but i used soxi -d, I did not find any audio length is <=0 in the filelist.
/8312/279790/8312_279790_000034_000002.wav 8.830000s
did anyone face this error and know how to resolve it? many thanks for advice.
(I got no problem on using ljs data.)