L0SG / WaveFlow

A PyTorch implementation of "WaveFlow: A Compact Flow-based Model for Raw Audio" (ICML 2020)
https://arxiv.org/abs/1912.01219
BSD 3-Clause "New" or "Revised" License
119 stars 16 forks source link

online speech syntensize and server code #1

Closed lalimili6 closed 3 years ago

lalimili6 commented 4 years ago

Hi can share your test waves? are they like https://waveflow-demo.github.io/? another question, Is there any server synthesizer? Do you compare of time of synthesizing with tacatron (like Mozilla)? Is it faster? best regards.

L0SG commented 4 years ago

Sorry for the late reply. Here's a zipped waveform sample of the trained model with 128 residual channels. I guess it's similar to their results.

Since I'm currently a graduate student, there's no resource for the deployment server. You may need to clone the repository and try training the model with the released code.

Currently, the model (with height=8 and 64 channels) hits ~93kHz from V100 which is around 2x slower than the results from the paper I believe. Not sure why, but maybe there remain some redundant ops from the current implementation.

If the sampling speed is the primary target, you may be interested in LPCNet from Mozilla which also provided heavily optimized codebase.