Rayhane-mamah / Tacotron-2

DeepMind's Tacotron-2 Tensorflow implementation
MIT License
2.27k stars 905 forks source link

Model can't pronounce abbreviation. #102

Closed ishansan38 closed 6 years ago

ishansan38 commented 6 years ago

Hi. I tried running Tacotron-2 on abbreviations like CEO and it was not able to correctly pronounce them. Is there any way to fix this?

Rayhane-mamah commented 6 years ago

Tacotron is trained on normalized text. I.e everything is explicitely spelled for the model so the behavior you're reporting is natural. The easiest way would be to make some normalization tool like tacotron/util/text.py. it is essential everything is written as the model is expected to read it. In your case, a simple conversion to "C E O" will do fine. Cheers!

On Sat, 14 Jul 2018, 20:56 Ishan Sharma, notifications@github.com wrote:

Hi. I tried running Tacotron-2 on abbreviations like CEO and it was not able to correctly pronounce them. Is there any way to fix this?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/Rayhane-mamah/Tacotron-2/issues/102, or mute the thread https://github.com/notifications/unsubscribe-auth/AhFSwEb5TM0zBh5kACJ8JUKecZ8lsBXnks5uGkzhgaJpZM4VP-4b .

kokimame commented 6 years ago

I had the same issue #61. Ultimately, a phoneme-based model taking phonemes as input will solve your problem. For this approach, you need to find a way to convert words into their phoneme representations. Most simply, it might be done using the CMU Pronouncing dictionary, e.g, in your case, CEO will be converted to "S IY IY OW" by the dictionary and probably be pronounced in the right way. But there are many out-of-vocabulary words, whose phoneme representations are not registered in the dictionary. One way to deal with these words is just use the character-based model you already have only for such words, as is discussed in the Deep Voice 3 paper. The other way is use a grapheme-phoneme converter and force to convert words to phonemes.

Anyway, training a model with a new input format from scratch takes a lot of time and effort. Good luck!

Rayhane-mamah commented 6 years ago

I am assuming this question has been answered, closing this due to lack of activity.

if the problem is persisting, feel free to reopen :)