Closed janbijster closed 3 years ago
I think it has to do with the fact that I extracted duration with the mfa_extraction method.
After inspecting the samples, I get the idea that the error is caused by the input to the model, so the ids denoting the characters/phonemes. The ids in ids.npy seem to correspond to the charactors in the text line, while the durations seem to correspond to phonemes.
For example: the first utterance has 50 charactors, that are converted by the MFA to 31 phonemes. The ids.npy file (generated by tensorflow-tts-preprocess) contains 50 elements, while the durations.npy file (generated by MFA) contains 31 elements.
I can now confirm this was the error. I extracted durations using a pretrained tacotron2 model and these had the right number of elements.
Hi, first of all: thank you very much for this valuable repo!
When I train the fastspeech2 model on my data, I keep running into the same error:
I don't believe the message about cache is the problem, this disappears when I turn off
allow_cache
.I think the relevant line is
tensorflow.python.framework.errors_impl.InvalidArgumentError: Incompatible shapes: [16,31,384] vs. [16,50,384]
If I shuffle the data, the middle numbers (31 & 50) in the error change, (different sample) but they always differ by a factor around ~ 1.5.
I checked the durations of the data: for all samples, the sum of elements in
...durations.npy
is equal to the size of the first dimension of..norm-feats.npy
and...raw-energy.npy
and...raw-f0.npy
.My sound samples have a sample rate of 16kHz.
For extracting durations, I used the
examples/mfa_extraction/
scripts and followed the steps in the readme. When extracting durations, I rantxt_grid_parser.py
with--sample-rate 16000
.I used the following configuration for preprocessing:
Then I ran the preprocess and normalization steps and then fix_mismatch.
Then I tried training with the following configuration: (sorry for the wall of text)
I tried with
remove_short_samples
andtrim_silence
on and off, tried with Tensorflow 2.4 and 2.3, GPU and CPU, but no luck. I also tried with another dataset, a subset of libritts. With this I changed the samplerate to 24000 and hop_size to 300. But I run into the sameIncompatible shapes:
error.Do you have any idea what could cause this?