bshall / UniversalVocoding

A PyTorch implementation of "Robust Universal Neural Vocoding"
https://bshall.github.io/UniversalVocoding/
MIT License
237 stars 41 forks source link

How to improve performance #21

Open Kerry0123 opened 3 years ago

Kerry0123 commented 3 years ago

Hello, It takes 25 seconds to generate three seconds (sample_rate 22050, about 15 words) audio. Do you have a good idea for performance optimization?We can discuss it. Thank you.

bshall commented 3 years ago

Hi @Kerry0123,

Yeah I get about 0.225x real-time with the 16kHz model. There are a number of tricks you can try to get improved speeds. You could probably apply most of the optimizations from the WaveRNN paper. Specifically, you'd need to implement:

  1. a single persistent GPU operation for sampling.
  2. structured sparcity.
  3. subscale sampling.

Unfortunately, I don't have much time to work on these optimizations but I'd be happy to accept and review any pull requests if you're interested in working on it.

Kerry0123 commented 3 years ago

Thank you for your reply. I will continue to report on my work