as-ideas / TransformerTTS

🤖💬 Transformer TTS: Implementation of a non-autoregressive Transformer based neural network for text to speech.
https://as-ideas.github.io/TransformerTTS/
Other
1.13k stars 227 forks source link

AssertionError when running extract_durations.py #56

Closed napalm00 closed 4 years ago

napalm00 commented 4 years ago
Traceback (most recent call last):
  File "extract_durations.py", line 160, in <module>
    fix_jumps=fix_jumps)
  File "/home/ubuntu/TransformerTTS/utils/alignments.py", line 131, in get_durations_from_alignment
    binary_attn, binary_score = binary_attention(ref_attention_weights)
  File "/home/ubuntu/TransformerTTS/utils/alignments.py", line 82, in binary_attention
    np.sum(attention_weights.T == attention_peak_per_phoneme, axis=0) != 1) == 0  # single peak per mel step
AssertionError

Happens when running python extract_durations.py --config ../ljspeech_melgan_autoregressive_transformer/melgan --binary --fix_jumps --fill_mode_next

on an autoregressive model trained to step 1,110,000 on a new dataset (restored from checkpoint 900k from the released model weights, commit 1c1cb03).

Also happens when using just the released 900k checkpoint with no training on the new dataset.

Any ideas what might be wrong? Does it need more training?

cfrancesco commented 4 years ago

Try to re-run the script, there is some randomness in prediction time too. This happens because attention sometimes is equal on multiple timesteps. Add/change the seed to get more control.

napalm00 commented 4 years ago

My mistake, I ran create_dataset.py using the ljspeech_melgan_forward_transformer configs instead of the (correct) ljspeech_melgan_autoregressive_transformer configs. Re-creating the dataset with the proper configs and running extract_durations worked properly.