Inference time is so slow.

fatchord / WaveRNN

WaveRNN Vocoder + TTS

https://fatchord.github.io/model_outputs/

MIT License

2.13k stars 698 forks source link

Inference time is so slow. #190

Closed chazo1994 closed 4 years ago

chazo1994 commented 4 years ago

I used GPU RTX 2080TI to train and infer model. Training time is so quick, but inference time so slow (I run gen_wavernn.py from wav file ). I saw that after upsample, the number of frames increase hundred times. I know that WaveRNN is very quick in compared with others Neural Vocoder.

mindmapper15 commented 4 years ago

Without batched generation, the inference speed of WaveRNN Vocoder is slow just like the any other WaveNet-based Neural Vocoders.

What makes WaveRNN generates audio fast is batched mode generation, which splits a single utterance into multiple segments to generate audio parallely.

To use batched mode generation for increasing audio generation speed, set voc_gen_batched=True in hparams.py

NOTE: batched mode generation is trade-off feature, if you set voc_target in hparams.py to smaller value, the generation speed will increase but the quality of generated audio goes worse.

chazo1994 commented 4 years ago

Without batched generation, the inference speed of WaveRNN Vocoder is slow just like the any other WaveNet-based Neural Vocoders.

What makes WaveRNN generates audio fast is batched mode generation, which splits a single utterance into multiple segments to generate audio parallely.

To use batched mode generation for increasing audio generation speed, set voc_gen_batched=True in hparams.py

NOTE: batched mode generation is trade-off feature, if you set voc_target in hparams.py to smaller value, the generation speed will increase but the quality of generated audio goes worse.

I already use batch processing, but it still so slow.

mindmapper15 commented 4 years ago

I already use batch processing, but it still so slow.

I don't know how much speed you expect but decreasing voc_target in hparams.py would help to speed up inference. But the quality of synthesized audio will become worse.