NTT123 / vietTTS

Vietnamese Text to Speech library
MIT License
201 stars 91 forks source link

How to handle english words in vietnamese text #13

Closed phanan9225 closed 3 years ago

phanan9225 commented 3 years ago

Hi, Based on your repo and your answers, I have built successfully a Vietnamese text-to-speech app with my own dataset. It sounds so good in the majority of cases. But I am still stuck on how to handle some English words (e.g, vaccine, morning...) that appear in the text. I have created a list of English words and mapping it with Vietnamese pronounce (e.g, vaccine - vắc xin) and updated it when new English words appear. However, It seems inefficient way. Do you have any advice for me in this case? Thank you so much.

NTT123 commented 3 years ago

I'm glad that vietTTS works well on your dataset!

One solution that I can think of to handle English words is to convert both English words and Vietnamese words to a standard phoneme representation (IPA, for example). Then, train the duration model and acoustic model on the IPA phoneme representation.

phanan9225 commented 3 years ago

I'm glad that vietTTS works well on your dataset!

One solution that I can think of to handle English words is to convert both English words and Vietnamese words to a standard phoneme representation (IPA, for example). Then, train the duration model and acoustic model on the IPA phoneme representation.

Thanks for answering. With this solution, I think I need an additional audio dataset for English with the same voice as Vietnamese. Is that correct?

NTT123 commented 3 years ago

With this solution, I think I need an additional audio dataset for English with the same voice as Vietnamese. Is that correct?

The best-case scenario, I think, is to have a dataset with English words and Vietnamese words in the same sentence.

phanan9225 commented 3 years ago

Thank you so much!

nampdn commented 2 years ago

With this solution, I think I need an additional audio dataset for English with the same voice as Vietnamese. Is that correct?

The best-case scenario, I think, is to have a dataset with English words and Vietnamese words in the same sentence.

Can you please give an example of two sentences?