Rayhane-mamah / Tacotron-2

DeepMind's Tacotron-2 Tensorflow implementation
MIT License
2.27k stars 905 forks source link

The reading is different for the sample including spelling errors. #437

Open kohei0912 opened 5 years ago

kohei0912 commented 5 years ago

Thank you for great implementation. I used Rayhane-mamah's Tacotron code and r9y9's WaveNet code, then trained models separately. Next, I tried to synthesize the sentence "Thisss isrealy awhsome.", which includes type errors. I want Tacotron2 to read it "This is really awsome." with robustness on spelling errors like DeepMind's original model. ・With using 105k step Tacotron model, It says "This is really " ・With using over 165k step Tacotron model, It says "Thisss isrealy " audiosamples.zip When I used r9y9's pretrained Tacotron model, it also says "This is really ***".

Why is this happening? What seems to be related? Please give me any idea...

*Training situation I used the default codes of Tacotron and trained from scratch with LJSpeech. I used almost default hparams, but adjust some audio properties for using r9y9's WaveNet. And use_lws = T, symmetric_mels = F.

kohei0912 commented 5 years ago

When I use r9y9's model, the alignment is below. alignment_r9y9 When I use my 165k step model, the alignment is different. alignment_mine