Tacotron alignments rapidly collapse to the end of encoder output sequences

Rayhane-mamah / Tacotron-2

DeepMind's Tacotron-2 Tensorflow implementation

MIT License

2.26k stars 907 forks source link

Tacotron alignments rapidly collapse to the end of encoder output sequences #232

Open mutiann opened 5 years ago

mutiann commented 5 years ago

In my training with my own data the loss converges rapidly, though as well as the alignments. After merely 1000 steps the attention part produce ~0.9 score for the last step of encoder outputs in most cases, and during synthesis (with Griffim-Lim) the Tacotron model would produce similar alignments and human-like garbling completely irrelevant with input text. Anyone have ideas about that? Thanks.

liangshuang1993 commented 4 years ago

@dy-octa Hi, have you solved this problem?