NVIDIA / tacotron2

Tacotron 2 - PyTorch implementation with faster-than-realtime inference
BSD 3-Clause "New" or "Revised" License
5.03k stars 1.37k forks source link

image of alignment is a horizontal line #209

Closed heshulin closed 5 years ago

heshulin commented 5 years ago

encoder and decoder Cannot align,the image of alignment is a horizontal line

rafaelvalle commented 5 years ago

Provide more information about your dataset and setup.

heshulin commented 5 years ago

提供有关数据集和设置的更多信息。

dataset https://www.data-baker.com Chinese use jieba library to pinyin Maybe I need to add tone information?

rafaelvalle commented 5 years ago

Tone information should not be necessary. Make sure you have proper cleaners and symbols.

heshulin commented 5 years ago

@rafaelvalle text_cleaners=['transliteration_cleaners'] symbols no modification

rafaelvalle commented 5 years ago

Check that the data is correct and that you have no silence at the beginning and end of audio files.

heshulin commented 5 years ago

@rafaelvalle ok,vad is necessary ,thank you very much

rafaelvalle commented 5 years ago

Glad to hear that using voice activity detection solves your problem!

Approximetal commented 4 years ago

@rafaelvalle Here said it would performed better to learn alignment if add 5*hop_size at the end of the the audio. But this answer said we should remove the slience. Therefore I was cofused whether the silence at the end of audio file should be added or removed? BTW, the alignment in my model is always a horizontal line, is there any suggest I can get?

rafaelvalle commented 4 years ago

First check that there's no mismatch between the audio and the transcription. Remove silence from the beginning and end of your audio file and start from the pre-trained model. This should work.