fatchord / WaveRNN

WaveRNN Vocoder + TTS
https://fatchord.github.io/model_outputs/
MIT License
2.14k stars 698 forks source link

How to improve the synthesis speed on CPU #102

Open 1105060120 opened 5 years ago

1105060120 commented 5 years ago

Hello,everyone.I synthesis the model on CPU for 6 minutes. How can I speed up and do real time synthesis on cpu.thank you

oytunturk commented 5 years ago

Tacotron and WaveRNN won't work real-time on CPU. You may want to look into LPCNet as an alternate to WaveRNN. However, you'll still need to use an inferior model instead of Tacotron to predict acoustic features from text/linguistic features such as simpler feedforward or recursive deepnets.

1105060120 commented 5 years ago

@oytunturk .But this paper from Google says it supports real-time on CPU,and i saw somebody export to C++ and real time inference on CPU

1105060120 commented 5 years ago

@oytunturk do you know how to export this to C++,thanks

oytunturk commented 5 years ago

Which Google paper are you refering to?

On Thu, Jul 4, 2019 at 11:30 AM 1105060120 notifications@github.com wrote:

@oytunturk https://github.com/oytunturk .But this paper from Google says it supports real-time on CPU,and i saw somebody export to C++ and real time inference on CPU

— You are receiving this because you were mentioned.

Reply to this email directly, view it on GitHub https://github.com/fatchord/WaveRNN/issues/102?email_source=notifications&email_token=ABMAQJ2EXUE7F56SQYRH3S3P5WYLPA5CNFSM4H5ULGA2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODZGXHVQ#issuecomment-508392406, or mute the thread https://github.com/notifications/unsubscribe-auth/ABMAQJZGIX3IYODYIPQADUTP5WYLPANCNFSM4H5ULGAQ .

1105060120 commented 5 years ago

@oytunturk waveRNN

oytunturk commented 5 years ago

That paper only discusses the vocoder portion and, yes, with the sparse wavernn model which does heavy weight pruning, it’s possible to run the sample generation from already predicted spectrograms. How are you planning to generate the spectrogram from text on a CPU? Tacotron won’t work that fast. Also, the tricks they implemented for fast wavernn inference are not available in this repo (as far as I know).

On Thu, Jul 4, 2019 at 4:41 PM 1105060120 notifications@github.com wrote:

@oytunturk https://github.com/oytunturk waveRNN

— You are receiving this because you were mentioned.

Reply to this email directly, view it on GitHub https://github.com/fatchord/WaveRNN/issues/102?email_source=notifications&email_token=ABMAQJ5ENG2I3GJ2KYFS5MDP5X4ZXA5CNFSM4H5ULGA2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODZHOJ6Y#issuecomment-508486907, or mute the thread https://github.com/notifications/unsubscribe-auth/ABMAQJ7QV7YOF427TFLYFOTP5X4ZXANCNFSM4H5ULGAQ .

1105060120 commented 5 years ago

@oytunturk maybe i use the fast speech to generate the spectrograms.The tricks for WaveRNN is not available?

oytunturk commented 5 years ago

Yes, you’ll need a spectrogram generator + a neural vocoder that are both significantly faster than real-time on a CPU. I’d look into models that can be parallelized to use multi-threading. I guess there will be significant quality loss wrto Tacotron+Wavenet/WaveRNN baselines.

On Thu, Jul 4, 2019 at 5:01 PM 1105060120 notifications@github.com wrote:

@oytunturk https://github.com/oytunturk maybe i use the fast speech to generate the spectrograms.The tricks for WaveRNN is not available?

— You are receiving this because you were mentioned.

Reply to this email directly, view it on GitHub https://github.com/fatchord/WaveRNN/issues/102?email_source=notifications&email_token=ABMAQJ7NM3CZCPVNUJPSKOLP5X7E7A5CNFSM4H5ULGA2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODZHP35A#issuecomment-508493300, or mute the thread https://github.com/notifications/unsubscribe-auth/ABMAQJ4BGSX6HU64AFIODD3P5X7E7ANCNFSM4H5ULGAQ .

TheButlah commented 5 years ago

@1105060120 For what its worth, 6 minutes seems really slow. Although its not realtime, my 2016 macbook pro synthesises at about a 5x slower than realtime speed (not including tacotron), and Tacotron is not particularly slow in my experience.

I'm sure that if you used weight pruning and did some optimizations you could get it to work acceptably. Its sure as hell a lot better than 15 minutes to synthesize an utterance on a 2080 Ti using the original WaveNet

1105060120 commented 5 years ago

i had improved the speed of WaveRNN,thank you.@TheButlah

OliverMathias commented 5 years ago

@1105060120 how did you go about improving the speed? I'm trying to process longer texts and it's becoming quite time-consuming.

1105060120 commented 5 years ago

@OliverMathias weight pruning and C++ infer.

muntasir2000 commented 5 years ago

@OliverMathias weight pruning and C++ infer.

Can you share some more details on how you did weight pruning and C++ infer?

OswaldoBornemann commented 4 years ago

@1105060120 Would you mind sharing the inference speed on 1s audio ?