Closed phanan9225 closed 3 years ago
I'm glad that vietTTS
works well on your dataset!
One solution that I can think of to handle English words is to convert both English words and Vietnamese words to a standard phoneme representation (IPA, for example). Then, train the duration model and acoustic model on the IPA phoneme representation.
I'm glad that
vietTTS
works well on your dataset!One solution that I can think of to handle English words is to convert both English words and Vietnamese words to a standard phoneme representation (IPA, for example). Then, train the duration model and acoustic model on the IPA phoneme representation.
Thanks for answering. With this solution, I think I need an additional audio dataset for English with the same voice as Vietnamese. Is that correct?
With this solution, I think I need an additional audio dataset for English with the same voice as Vietnamese. Is that correct?
The best-case scenario, I think, is to have a dataset with English words and Vietnamese words in the same sentence.
Thank you so much!
With this solution, I think I need an additional audio dataset for English with the same voice as Vietnamese. Is that correct?
The best-case scenario, I think, is to have a dataset with English words and Vietnamese words in the same sentence.
Can you please give an example of two sentences?
Hi, Based on your repo and your answers, I have built successfully a Vietnamese text-to-speech app with my own dataset. It sounds so good in the majority of cases. But I am still stuck on how to handle some English words (e.g, vaccine, morning...) that appear in the text. I have created a list of English words and mapping it with Vietnamese pronounce (e.g, vaccine - vắc xin) and updated it when new English words appear. However, It seems inefficient way. Do you have any advice for me in this case? Thank you so much.