Long words - right bucket size/parameters ?

cmusphinx / g2p-seq2seq

G2P with Tensorflow

Other

667 stars 196 forks source link

Long words - right bucket size/parameters ? #196

Open entenbein opened 4 years ago

entenbein commented 4 years ago

Hi folks,

I trained a model for German and now I'm struggling with predicted output for longer words (e.g. 39 letters ≙ 34 phones, yeah German...). Meaning for the predicted words the last phones are repeated over and over again.

So for training I set max_length=50. The results got better but there are some phone repetitions still.

How do the other to bucket parameters influence the predicted transcriptions?

Thanks alot!

nshmyrev commented 4 years ago

You'd better try something modern transformer architecture, not seq2seq.

entenbein commented 4 years ago

Alright, which ones would you suggest?

nshmyrev commented 4 years ago

Maybe https://github.com/hajix/G2P