Closed lolotica123 closed 4 years ago
@xiuyayang did you normalize text before training ?. Seems there are some problems with the preprocessing step here :D
@dathudeptrai, no, i don't, i just do step by step in readme.
tensorflow-tts-preprocess tensorflow-tts-normalize Train tacotron 2 Extract duration to train fastspech 2. But when i do synthesize, both model tacotron 2 and fastspeech 2 generate the same audio... Can you let me know, what should i do. Thanks so Much.
@dathudeptrai, no, i don't, i just do step by step in readme.
tensorflow-tts-preprocess tensorflow-tts-normalize Train tacotron 2 Extract duration to train fastspech 2. But when i do synthesize, both model tacotron 2 and fastspeech 2 generate the same audio... Can you let me know, what should i do. Thanks so Much.
i think the problem is in normalization step before training, maybe you should lower the text, convert number to readable form.
Thanks @dathudeptrai for your fast response. In metadata.csv has texts lower and hasn't numbers. Why ljspeech_mapper.json in dump_ljspeech has the same result when I try replace _letters = "AÁÀẠẢÃ.." instead of _letters = "ABCDEFGHIJKLMN.." and comment valid_symbols? Can you share me intro to train tacotron 2 in Vietnamese? Thanks so much.
@xiuyayang you should define ur dataset parameter before preprocessing (see https://github.com/TensorSpeech/TensorFlowTTS/blob/master/tensorflow_tts/bin/preprocess.py#L347-L359 and https://github.com/TensorSpeech/TensorFlowTTS/blob/master/tensorflow_tts/bin/preprocess.py#L369)
I do exactly what you guide, but, result audio synthesize hurt.. Do you trained text to speech in vietnamese? Can you share me script in ljspeech.py in processor...? Thanks so much.
Hi @lolotica123, I know this issue was closed long time ago but did you solve it? I just trained tacotron2 on my own Vietnamese dataset and ran into the same problem as you. Thank you very much.
Hi, @dathudeptrai, thanks for great repo about Text to speech. I have question about preprocess ljspeech. In .csv data has format 12_wav | | hai cái đầu sẽ nghĩ ra những cái mà một cái đầu không nghĩ ra nổi, and I change to _letters = "AÁÀẠẢÃĂẰẮẶẲẴÂẤẦẬẨẪBCDĐEÉÈẺẼẸÊẾỀỂỄỆFGHIÍÌỊỈĨJKL... But when I training finished at 40k but synthesize to audio and transcript