Open 1105060120 opened 5 years ago
Tacotron and WaveRNN won't work real-time on CPU. You may want to look into LPCNet as an alternate to WaveRNN. However, you'll still need to use an inferior model instead of Tacotron to predict acoustic features from text/linguistic features such as simpler feedforward or recursive deepnets.
@oytunturk .But this paper from Google says it supports real-time on CPU,and i saw somebody export to C++ and real time inference on CPU
@oytunturk do you know how to export this to C++,thanks
Which Google paper are you refering to?
On Thu, Jul 4, 2019 at 11:30 AM 1105060120 notifications@github.com wrote:
@oytunturk https://github.com/oytunturk .But this paper from Google says it supports real-time on CPU,and i saw somebody export to C++ and real time inference on CPU
— You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub https://github.com/fatchord/WaveRNN/issues/102?email_source=notifications&email_token=ABMAQJ2EXUE7F56SQYRH3S3P5WYLPA5CNFSM4H5ULGA2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODZGXHVQ#issuecomment-508392406, or mute the thread https://github.com/notifications/unsubscribe-auth/ABMAQJZGIX3IYODYIPQADUTP5WYLPANCNFSM4H5ULGAQ .
@oytunturk waveRNN
That paper only discusses the vocoder portion and, yes, with the sparse wavernn model which does heavy weight pruning, it’s possible to run the sample generation from already predicted spectrograms. How are you planning to generate the spectrogram from text on a CPU? Tacotron won’t work that fast. Also, the tricks they implemented for fast wavernn inference are not available in this repo (as far as I know).
On Thu, Jul 4, 2019 at 4:41 PM 1105060120 notifications@github.com wrote:
@oytunturk https://github.com/oytunturk waveRNN
— You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub https://github.com/fatchord/WaveRNN/issues/102?email_source=notifications&email_token=ABMAQJ5ENG2I3GJ2KYFS5MDP5X4ZXA5CNFSM4H5ULGA2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODZHOJ6Y#issuecomment-508486907, or mute the thread https://github.com/notifications/unsubscribe-auth/ABMAQJ7QV7YOF427TFLYFOTP5X4ZXANCNFSM4H5ULGAQ .
@oytunturk maybe i use the fast speech to generate the spectrograms.The tricks for WaveRNN is not available?
Yes, you’ll need a spectrogram generator + a neural vocoder that are both significantly faster than real-time on a CPU. I’d look into models that can be parallelized to use multi-threading. I guess there will be significant quality loss wrto Tacotron+Wavenet/WaveRNN baselines.
On Thu, Jul 4, 2019 at 5:01 PM 1105060120 notifications@github.com wrote:
@oytunturk https://github.com/oytunturk maybe i use the fast speech to generate the spectrograms.The tricks for WaveRNN is not available?
— You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub https://github.com/fatchord/WaveRNN/issues/102?email_source=notifications&email_token=ABMAQJ7NM3CZCPVNUJPSKOLP5X7E7A5CNFSM4H5ULGA2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODZHP35A#issuecomment-508493300, or mute the thread https://github.com/notifications/unsubscribe-auth/ABMAQJ4BGSX6HU64AFIODD3P5X7E7ANCNFSM4H5ULGAQ .
@1105060120 For what its worth, 6 minutes seems really slow. Although its not realtime, my 2016 macbook pro synthesises at about a 5x slower than realtime speed (not including tacotron), and Tacotron is not particularly slow in my experience.
I'm sure that if you used weight pruning and did some optimizations you could get it to work acceptably. Its sure as hell a lot better than 15 minutes to synthesize an utterance on a 2080 Ti using the original WaveNet
i had improved the speed of WaveRNN,thank you.@TheButlah
@1105060120 how did you go about improving the speed? I'm trying to process longer texts and it's becoming quite time-consuming.
@OliverMathias weight pruning and C++ infer.
@OliverMathias weight pruning and C++ infer.
Can you share some more details on how you did weight pruning and C++ infer?
@1105060120 Would you mind sharing the inference speed on 1s audio ?
Hello,everyone.I synthesis the model on CPU for 6 minutes. How can I speed up and do real time synthesis on cpu.thank you