NVIDIA / DeepLearningExamples

State-of-the-Art Deep Learning scripts organized by models - easy to train and deploy with reproducible accuracy and performance on enterprise-grade infrastructure.
13.47k stars 3.21k forks source link

[Tacotron2/TRTIS] Is it possible support non-English language like Chinese in trtis_cpp? #767

Open RaymondTsao opened 3 years ago

RaymondTsao commented 3 years ago

Describe the bug

I already trained a English & Chinese bilingual tacotron model with my own data on following source: https://github.com/NVIDIA/DeepLearningExamples/tree/master/PyTorch/SpeechSynthesis/Tacotron2

And inference output is OK, my inference phrase like following: ji2-jiang1 wei4-nin2 bo1-fang4 pau yi4-shu4 wu3-dao4 pau gu3-ba1 dang1-dai4 wu3-dao4-tuan2 pau DH-AX0 S-AE1-K-R-AO0-L D-AE1-N-S then the output file I can understand what it say

Then I need speed up inference time, I try following the steps to export my own tacotron model to onnx & tensorRT in _trtiscpp folder: https://github.com/NVIDIA/DeepLearningExamples/tree/master/PyTorch/SpeechSynthesis/Tacotron2/trtis_cpp and could transfer my model successfully.

But inference output I can't understand what it saying, like alien language. :( I found can add possible syllable in model-config/tacotron2waveglow/mapping.txt , but I added all syllable and rebuild again, the inference output audio file still sound nonsense.

So, is it possible support non-english language like chinese in trtis_cpp? Any files could modify to do this?

ma-siddiqui commented 3 years ago

Hi,

Same we are facing for arabic, may we have an update on this?

Thanks, Muhammad Ajmal Siddiqui

guyqaz commented 3 years ago

Hi,

Same we are facing for Vietnamese?

Thanks, Thuy Tran

R7788380 commented 2 years ago

Describe the bug

I already trained a English & Chinese bilingual tacotron model with my own data on following source: https://github.com/NVIDIA/DeepLearningExamples/tree/master/PyTorch/SpeechSynthesis/Tacotron2

And inference output is OK, my inference phrase like following: ji2-jiang1 wei4-nin2 bo1-fang4 pau yi4-shu4 wu3-dao4 pau gu3-ba1 dang1-dai4 wu3-dao4-tuan2 pau DH-AX0 S-AE1-K-R-AO0-L D-AE1-N-S then the output file I can understand what it say

Then I need speed up inference time, I try following the steps to export my own tacotron model to onnx & tensorRT in _trtiscpp folder: https://github.com/NVIDIA/DeepLearningExamples/tree/master/PyTorch/SpeechSynthesis/Tacotron2/trtis_cpp and could transfer my model successfully.

But inference output I can't understand what it saying, like alien language. :( I found can add possible syllable in model-config/tacotron2waveglow/mapping.txt , but I added all syllable and rebuild again, the inference output audio file still sound nonsense.

So, is it possible support non-english language like chinese in trtis_cpp? Any files could modify to do this?

@RaymondTsao I would like to train an English & Chinese bilingual TTS model, how is the format of your datasets? Your part of inference phrase like wu3-dao4, does it mean 舞蹈 in Chinese? If true, how do I transform all the text into this form?

And the DH-AX0 S-AE1-K-R-AO0-L D-AE1-N-S is the phoneme of English?

RaymondTsao commented 2 years ago

@R7788380

hi, I have my own chinese & english parser and dictionary to do it. So you may can make mandarin text to symbols like by python module name pinyin, but output without parser information.

yeah, DH-AX0 S-AE1-K-R-AO0-L D-AE1-N-S is the phoneme of English, you can do it by CMUdict.

R7788380 commented 2 years ago

@R7788380

hi, I have my own chinese & english parser and dictionary to do it. So you may can make mandarin text to symbols like by python module name pinyin, but without parser information.

yeah, DH-AX0 S-AE1-K-R-AO0-L D-AE1-N-S is the phoneme of English, you can do it by CMUdict.

Thank you very much for your reply! I will try it.