Closed augustfr closed 3 years ago
Tacotron2 is slow. Use FastSpeech2 with MB-MelGAN :)
I'm using FastSpeech2 with MB-MelGAN now, it definitely is faster than Tacotron2, but still doesn't feel like it could be used in realtime. It takes about 7 seconds for mine to run the do_synthesis function and output the .wav file. Has anyone been able to increase that speed without upgrading computer hardware?
@augustfr What do you mean? FS2 + MB-MelGAN can run about 2.5x faster than realtime on the average consumer PC (CPU).
@ZDisket I've attached the code I'm running. If anyone could let me know why it takes about 7 seconds to process after the text is inputted that would be great. I'm running it on Ubuntu p2.xlarge instance.
@augustfr what is ur text :)) and what is ur CPU ?
@dathudeptrai I was just doing one or two sentences. Something like "This is the new version of text to speech. Testing to see how it works"
I'm running it on a p2.xlarge ec2 instance which has 4 vCPU's.
@augustfr Time taken to initialize the model is long ,but speech synthesis just need 200ms
@dtx525942103 is there a way to initialize the model ahead of time?
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.
What do I need to do to get the TTS to run in realtime? I've attached the python file I'm using to get the output. It takes about 15 seconds until it prompts the user for the text input. Then takes about 7 seconds to make the .wav file after receiving the text input. I'm running it on an EC2 instance with a Tesla K80 gpu. Is there a way to make this much faster without upgrading the computer I'm running it on? tacotron2.pdf