TensorSpeech / TensorFlowTTS

:stuck_out_tongue_closed_eyes: TensorFlowTTS: Real-Time State-of-the-art Speech Synthesis for Tensorflow 2 (supported including English, French, Korean, Chinese, German and Easy to adapt for other languages)
https://tensorspeech.github.io/TensorFlowTTS/
Apache License 2.0
3.85k stars 815 forks source link

Realtime #406

Closed augustfr closed 3 years ago

augustfr commented 3 years ago

What do I need to do to get the TTS to run in realtime? I've attached the python file I'm using to get the output. It takes about 15 seconds until it prompts the user for the text input. Then takes about 7 seconds to make the .wav file after receiving the text input. I'm running it on an EC2 instance with a Tesla K80 gpu. Is there a way to make this much faster without upgrading the computer I'm running it on? tacotron2.pdf

crux153 commented 3 years ago

Tacotron2 is slow. Use FastSpeech2 with MB-MelGAN :)

augustfr commented 3 years ago

I'm using FastSpeech2 with MB-MelGAN now, it definitely is faster than Tacotron2, but still doesn't feel like it could be used in realtime. It takes about 7 seconds for mine to run the do_synthesis function and output the .wav file. Has anyone been able to increase that speed without upgrading computer hardware?

ZDisket commented 3 years ago

@augustfr What do you mean? FS2 + MB-MelGAN can run about 2.5x faster than realtime on the average consumer PC (CPU).

augustfr commented 3 years ago

@ZDisket I've attached the code I'm running. If anyone could let me know why it takes about 7 seconds to process after the text is inputted that would be great. I'm running it on Ubuntu p2.xlarge instance.

fastspeech2-mb_melgan copy.txt

dathudeptrai commented 3 years ago

@augustfr what is ur text :)) and what is ur CPU ?

augustfr commented 3 years ago

@dathudeptrai I was just doing one or two sentences. Something like "This is the new version of text to speech. Testing to see how it works"

I'm running it on a p2.xlarge ec2 instance which has 4 vCPU's.

MaxMax2016 commented 3 years ago

@augustfr Time taken to initialize the model is long ,but speech synthesis just need 200ms

augustfr commented 3 years ago

@dtx525942103 is there a way to initialize the model ahead of time?

stale[bot] commented 3 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.