TensorSpeech / TensorFlowTTS

:stuck_out_tongue_closed_eyes: TensorFlowTTS: Real-Time State-of-the-art Speech Synthesis for Tensorflow 2 (supported including English, French, Korean, Chinese, German and Easy to adapt for other languages)
https://tensorspeech.github.io/TensorFlowTTS/
Apache License 2.0
3.85k stars 814 forks source link

How to train with dataset is vietnamese #298

Closed lolotica123 closed 4 years ago

lolotica123 commented 4 years ago

Hi, @dathudeptrai, thanks for great repo about Text to speech. I have question about preprocess ljspeech. In .csv data has format 12_wav | | hai cái đầu sẽ nghĩ ra những cái mà một cái đầu không nghĩ ra nổi, and I change to _letters = "AÁÀẠẢÃĂẰẮẶẲẴÂẤẦẬẨẪBCDĐEÉÈẺẼẸÊẾỀỂỄỆFGHIÍÌỊỈĨJKL... But when I training finished at 40k but synthesize to audio and transcript

"Tuy nhiên, giữa đêm đã xảy ra vụ sạt lở đất tại vị trí hai căn phòng của lực lượng cứu hộ đang nghỉ. Sau đó, chỉ một số trong hơn 20 người may mắn thoát ra khỏi hiện trường vụ sạt lở và quay về" Model had trained with tacotron 2, fastspeech 2 and multi ban melgan. thanks so much.

dathudeptrai commented 4 years ago

@xiuyayang did you normalize text before training ?. Seems there are some problems with the preprocessing step here :D

lolotica123 commented 4 years ago

@dathudeptrai, no, i don't, i just do step by step in readme.

tensorflow-tts-preprocess tensorflow-tts-normalize Train tacotron 2 Extract duration to train fastspech 2. But when i do synthesize, both model tacotron 2 and fastspeech 2 generate the same audio... Can you let me know, what should i do. Thanks so Much.

dathudeptrai commented 4 years ago

@dathudeptrai, no, i don't, i just do step by step in readme.

tensorflow-tts-preprocess tensorflow-tts-normalize Train tacotron 2 Extract duration to train fastspech 2. But when i do synthesize, both model tacotron 2 and fastspeech 2 generate the same audio... Can you let me know, what should i do. Thanks so Much.

i think the problem is in normalization step before training, maybe you should lower the text, convert number to readable form.

lolotica123 commented 4 years ago

Thanks @dathudeptrai for your fast response. In metadata.csv has texts lower and hasn't numbers. Why ljspeech_mapper.json in dump_ljspeech has the same result when I try replace _letters = "AÁÀẠẢÃ.." instead of _letters = "ABCDEFGHIJKLMN.." and comment valid_symbols? Can you share me intro to train tacotron 2 in Vietnamese? Thanks so much.

dathudeptrai commented 4 years ago

@xiuyayang you should define ur dataset parameter before preprocessing (see https://github.com/TensorSpeech/TensorFlowTTS/blob/master/tensorflow_tts/bin/preprocess.py#L347-L359 and https://github.com/TensorSpeech/TensorFlowTTS/blob/master/tensorflow_tts/bin/preprocess.py#L369)

lolotica123 commented 4 years ago

I do exactly what you guide, but, result audio synthesize hurt.. Do you trained text to speech in vietnamese? Can you share me script in ljspeech.py in processor...? Thanks so much.

nqcccccc commented 2 years ago

Hi @lolotica123, I know this issue was closed long time ago but did you solve it? I just trained tacotron2 on my own Vietnamese dataset and ran into the same problem as you. Thank you very much.