TensorSpeech / TensorFlowTTS

:stuck_out_tongue_closed_eyes: TensorFlowTTS: Real-Time State-of-the-art Speech Synthesis for Tensorflow 2 (supported including English, French, Korean, Chinese, German and Easy to adapt for other languages)
https://tensorspeech.github.io/TensorFlowTTS/
Apache License 2.0
3.84k stars 814 forks source link

What should I do to add a Turkish TTS model? #572

Closed AyseEe501 closed 3 years ago

AyseEe501 commented 3 years ago

I would be glad if you could show me a way on how to proceed with Mozilla Common Voice in order to create the Turkish TTS model.I really need a lot of help in this regard. Thank you for your interest in advance.

ZDisket commented 3 years ago

Mozilla Common Voice is a very varied dataset, not optimal for training TTS; I remember something like this being said in the Mozilla TTS forums, but I can't find the thread. You'd be better off finding a large open dataset in Turkish and training that.

Normally, the procedure is decide on whether you want phoneme or text-based, then create a processor, and train model.

dathudeptrai commented 3 years ago

@monatis :D

monatis commented 3 years ago

Hi @AyseEe501, unfortunately CommonVoice is not a good fit for TTS training as it is crowd-sourced and of lower quality. We need to form a decent quality dataset to train one because we don't have one in Turkish. For example, you may record your own voice with a high-quality studio microphone and form your own dataset. I'll be hosting @ThorstenMueller, who exactly did so, at an event on June 2nd. You may tune in at 9pm (Turkey time) if you'd like to learn how you can do the same.

AyseEe501 commented 3 years ago

@monatis I understand and thank you for this nice invitation. Of course I would like to attend. I am aware that I have to improve myself on this subject.I am aware that the Common Voice dataset is troublesome in this regard. However, nowadays we have not left home due to the pandemic, it is very troublesome to prepare the dataset without any help, I cannot find anyone to help me. Thank you very much for really informing me :)

AyseEe501 commented 3 years ago

@ZDisket Thank you for informing :)

dathudeptrai commented 3 years ago

@monatis i remember that you have a plan to release the Turkish TTS dataset in the past?

monatis commented 3 years ago

@dathudeptrai th's true, but I realized that the recording quality was poor when I listened to it with a maximum volume, so I decided to record a new one and this in in progress 😄

AyseEe501 commented 3 years ago

@dathudeptrai bu doğru ama maksimum seste dinlediğimde kayıt kalitesinin düşük olduğunu fark ettim ve yenisini kaydetmeye karar verdim ve bu devam ediyor😄

Do you have an estimated completion time?

monatis commented 3 years ago

@AyseEe501 around one month or so.

thorstenMueller commented 3 years ago

@dathudeptrai th's true, but I realized that the recording quality was poor when I listened to it with a maximum volume, so I decided to record a new one and this in in progress 😄

Somehow it sounds familiar to me @monatis :smirk:

AyseEe501 commented 3 years ago

@ AyseEe501 yaklaşık bir ay kadar. I'm waiting impatiently then😅🤗

wanwen1405 commented 3 years ago

Hi @dathudeptrai @monatis

Currently, the pretrained TTS models like FastSpeech2 supports ljspeech,kss,baker,libritts & thorsten datasets.

I would like to find out how I can train the model with a Singapore-English language dataset.

Singapore-English language comprises a mixture of words of many different languages - Malay, Tamil, Dialects,English & Chinese.

I have my audio waves & transcript (metadata.csv) files already ready. Recorded with professional equipments, have trimmed silences with sox & standardize all to 22050hz for training.

How can I contribute my datasets to this community? Could anyone recommend me references/guides on how to do so ? I want to specifically train my dataset with FastSpeech2 model :)

Please let me know.

stale[bot] commented 3 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.