Closed heshulin closed 5 years ago
Provide more information about your dataset and setup.
提供有关数据集和设置的更多信息。
dataset https://www.data-baker.com Chinese use jieba library to pinyin Maybe I need to add tone information?
@rafaelvalle text_cleaners=['transliteration_cleaners'] symbols no modification
Check that the data is correct and that you have no silence at the beginning and end of audio files.
@rafaelvalle ok,vad is necessary ,thank you very much
Glad to hear that using voice activity detection solves your problem!
@rafaelvalle Here said it would performed better to learn alignment if add 5*hop_size at the end of the the audio. But this answer said we should remove the slience. Therefore I was cofused whether the silence at the end of audio file should be added or removed? BTW, the alignment in my model is always a horizontal line, is there any suggest I can get?
First check that there's no mismatch between the audio and the transcription. Remove silence from the beginning and end of your audio file and start from the pre-trained model. This should work.
encoder and decoder Cannot align,the image of alignment is a horizontal line