TensorSpeech / TensorFlowTTS

:stuck_out_tongue_closed_eyes: TensorFlowTTS: Real-Time State-of-the-art Speech Synthesis for Tensorflow 2 (supported including English, French, Korean, Chinese, German and Easy to adapt for other languages)
https://tensorspeech.github.io/TensorFlowTTS/
Apache License 2.0
3.84k stars 815 forks source link

Android example using Tacotron2 and mb-melgan #228

Closed Zak-SA closed 4 years ago

Zak-SA commented 4 years ago

Will you share android Example using Tacotron2 and MB-Melgan anytime soon?

Thanks

dathudeptrai commented 4 years ago

@Zak-SA tacotron-2 so slow, can you try fastspeech ?

Zak-SA commented 4 years ago

@dathudeptrai I will try fastSpeech. But it would be good to have the android example for tacotron2

Zak-SA commented 4 years ago

@dathudeptrai When I tried to train Fastspeech2 I am getting this error: Traceback (most recent call last): File "examples/fastspeech2/train_fastspeech2.py", line 400, in main() File "examples/fastspeech2/train_fastspeech2.py", line 307, in main train_dataset = CharactorDurationF0EnergyMelDataset( File "./examples/fastspeech2/fastspeech2_dataset.py", line 98, in init assert ( AssertionError: Number of charactor, mel, duration, f0 and energy files are different

can you help?

Thanks

kau1992 commented 4 years ago

@Zak-SA Did you extract duration using MFA or Tacotron-2 ? For fastspeech training duration required.

Please check below link - Extract duration from MFA - https://github.com/TensorSpeech/TensorFlowTTS/tree/master/examples/mfa_extraction

Extract duration from Tacotron-2 https://github.com/TensorSpeech/TensorFlowTTS/tree/master/examples/tacotron2#step-4-extract-duration-from-alignments-for-fastspeech

dathudeptrai commented 4 years ago

@Zak-SA did you read a tutorial carefully ?, did you extract duration from tacotron2 ?, the error is very clear AssertionError: Number of charactor, mel, duration, f0 and energy files are different, can you check it by urself ?

Zak-SA commented 4 years ago

@dathudeptrai I will check it again, Thanks for the links

Zak-SA commented 4 years ago

@dathudeptrai i did extract the duration but the path was different. I did read the tutorial carefully but after replacing the gpu I preprocessed the data again and it added the speaker name on the dump folder while the duration where in the old dump folder. That's why i got confused because i know I did extract the duration. Thanks again for the help.

tekinek commented 4 years ago

@Zak-SA I think the safest and first try for duration extraction should go with Tactroton2. It needs you to train Tactron2 model first until around 60k steps: https://github.com/TensorSpeech/TensorFlowTTS/tree/master/examples/tacotron2#step-4-extract-duration-from-alignments-for-fastspeech

Zak-SA commented 4 years ago

@tekinek thanks for your reply. I extracted the duration earlier when I got to step 54k. I may redo it again if you believe that 60k is better. I thought it would be ok for 50k+ step.

tekinek commented 4 years ago

@Zak-SA 54k should be enough. If the durations were extracted using Tacotron2, then your problem is more of man-made. Just double check your commands and their parameters. Or, re-do preprocessing and duration extraction step by step according to instruction.

Zak-SA commented 4 years ago

@tekinek Yes I explained above the problem. When I extracted the duration I was using my old gpu. So the folder name that contains the preprocessed data is "dump" When i installed new gpu i decided to preprocess again but the updates added the speaker name for the dump folder, so I had two folders. One had the old preprocessed data with duration extracted and the second one without it.

The problem is solved now. Thank you for your help.