Test "python train.py --model='WaveNet' ", Assertion error.

Yeongtae commented 6 years ago

For testing, I use mulaw(8bit) and raw(16bit).

After I have run 'Python wavenet_preprocess.py', I can get these files.
After I have run "Python train --model='WaveNet' " without any modification, I can see an error.

It seems to be a problem because of a part of 'map.txt'.

Modifying 'hparams.py' to prevent the above error. Though we use GTA mel files, we must set the parameter false. It's very weird.
After I have run "Python train --model='WaveNet' with the modification, I can see an assertion error.

Rayhane-mamah commented 6 years ago

Hello @Yeongtae, you make a point, documentation is missing the standalone Wavenet explanation, so let me put few notes here to explain quickly:

When using "wavenetpreprocess.py", it is supposed that you're using Wavenet only from the repo without tacotron, so the preprocessor sets '' for all GTA paths as they simply don't exist. Thus "train_with_GTA" must be set to False, like you did.
your second bug is happening on real mels, so the assertion error is rising for one of 2 reasons:
- You hop_size is not the product of your upsample_scales. Defaults are 300 and [15, 20] respectively for hop_size and upsample_scales, note that 300 = 15 * 20. This must always be verified.
- If the first condition is verified, did you make sure to do the preprocessing AFTER you changed any audio/mels parameters? Changes in "hparams.py" that affect the audio preprocessing are only applied when you do the preprocessing again. Since you use "wavenet_preprocess.py", you may have changed the sample rate and hop size and changed the upsample_scales to match without doing the preprocessing again.

A quick personal POV while we're at it, if you want to make a 8-bit wavenet, I recommend going with mulaw-quantize instead of mulaw as it converges faster for about the same quality. :)

Rayhane-mamah commented 6 years ago

@Yeongtae, Just adding another note here because everyone seems to be having a confusion when it comes to GTA:

GTA: Ground-truth aligned are NOT Ground-truth labels. There are two ways to train Wavenet:
- Train with the Ground-truth spectrograms which are extracted from audio. To avoid confusion, I always call those real spectrograms
- Train with the Ground-truth aligned (GTA) spectrograms which are predicted by the Tacotron model using teacher-force method at the decoder level (to ensure time alignment, thus the word "aligned"). These are called GTA spectrograms. This method is used to train the Wavenet on already lossy mel-spectrograms so that the learned distribution is closer to the synthesis test case than when using real spectrograms for training. (Wavenet gets used to seeing lossy spectrograms at training)

In T2 paper, it is mentioned in the line below: "We then train our modified WaveNet on the ground truth-aligned predictions of the feature prediction network."

Yeongtae commented 6 years ago

I accepted and applied your advice. But I can't fix it. Errors always occur at the same location.

Yeongtae commented 6 years ago

After debugging, I could see the value of 298 which isn't 300. 300 is our hop size.

In my opinion, there is a bug at the '_adjust_time_resolution' function in "wavenet_vocoder/feeder.py"

Rayhane-mamah commented 6 years ago

This has been permanently fixed by going back to the source of the initial problem. I have manually tested GTA and no GTA synthesis of Wavenet on 4 different datasets without problem. For that matter I added a "test_wavenet_feeder.py" file to test all training files compatibilities prior to training.

If the issue is persistent, it's most likely due to a misuse of the model, please let me know how things go for you. (GTA synthesis has been changed so make sure to do that again if you try GTA training, it goes 40x faster than our previous implementation so it shouldn't be a problem to redo)

Feel free to reopen if it persists.

Rayhane-mamah / Tacotron-2

Test "python train.py --model='WaveNet' ", Assertion error. #132