Closed dan-ya closed 7 months ago
I also don't see why there should be a problem suddenly at this stage of training... It looks like suddenly the number of input symbols for some utterance is larger than the expected maximum value used to set up the padded text input tensor.
See line 462 where max_input_len
is set to match the longest text input sequence in the batch, then line 488 where dur_padded
is defined with shape (batch, max_target_len, max_input_len)
. We fill this in per batch item in the following loop, and in your error are trying to insert something with input length 114 when max_input_len
is only 113.
You could check the problem and get the utterance ID and some other information by adding something like this assertion immediately before line 493 (you may need to fiddle with this, I haven't tested):
assert dur.size(1) <= max_input_len, f"{fnames[i]}, {dur.shape}, {input_lengths[i]}, {max_input_len}"
With that, I would go away and check what the original text is for that utterance, and what the output of your text processor's encode_text()
method is, just in case there's anything unexpected, but I really don't know how the output of that could be different from one epoch to the next. Unfortunately that will take a little bit of manual setup I think, i.e. instantiating an equivalent TextProcessor
object given your configuration, I don't think all the stuff you would need to debug is available in the collate function that's throwing your error.
It also seems strange to me to run into a problem with input lengths at all because I think with your batch_size=4
and distributing to 4 GPUs, each GPU is actually running with a batch of 1: the batch_size
option is the effective batch size you want to run, and the details of dividing per GPU are handled in the background (just in case you were reducing this number to account for your multiple GPUs).
Thank you very much for your answers, I will try to find out what is going on there.
Thank you for your help! I found an error in the data. It was my fault. One file from LJS dataset (LJ036-0032) has a space as a last character in the text and I mistakenly stripped it together with '\n' symbol. The training seems to be running fine. I close this issue.
I got another error when trying to train the model in an original FastPitch way, with --use-mas enabled, when using 4 GPUs. It seems, that it crashed in the end of the 4th epoch. I have no idea what could be a reason for it, as everything was fine during several epochs.
The error message is:
The last output lines were:
The setup is the same: