Closed mahdeto closed 5 years ago
In case someone runs into this. Turns out I had some corrupted wav files (very small but had some speech) that caused this.
Additionally this case is possible, when the wav file is a stereo, not mono. Tacotron is using 16kHz 16bit mono for training.
I had the same problem,now. Codec:PCM S16LE(s16l) Type:Audio Channels:Mono Sample rate:16000Hz Bits per sample:16
Maybe is a codec problem,Above info is obtained from VLC,anyone can inform us the working setup?
Quite confusing,I also edited the hparam file respectively does not seem to work.
Originally 22khz,now 16kHz 16bit mono for training. Does not seems to work.hmmmm..
Oh thanks i also suspected those weird small files it is not corrupted but,something is wrong when used as input.Solved.THANKS.
In case someone runs into this. Turns out I had some corrupted wav files (very small but had some speech) that caused this.
I am also getting this error: RuntimeError: Argument #4: Padding size should be less than the corresponding input dimension, but got: padding (512, 512) at dimension 3 of input 4 when i am doing inference: what do you mean by corrupted wav files? when i use 16khz file , i was getting sample rate error so i had to switch back to 22khz. My inference script: python inference.py --tacotron2 output/checkpoint_Tacotron2_last.pt --waveglow output/checkpoint_WaveGlow_last.pt --cpu -o output/ -i phrases1/phrase.txt I can run the same inference with prretrained models, but not with my trained models.
I am seeing this error:
Argument #4: Padding size should be less than the corresponding input dimension, but got: padding (512, 512) at dimension 3 of input [1, 1, 1, 220]
after trying to train using this command:
python train.py --output_directory=outdir --log_directory=logdir
I am using pytorch 1.0 and python 3.6 with a single Tesla V100 gpu and I am using my own data set which I processed to be identical to the LJSpeech format and have changed the filelists accordingly.
The full log is:
Please help. Thanks!