online speech syntensize and server code

L0SG / WaveFlow

A PyTorch implementation of "WaveFlow: A Compact Flow-based Model for Raw Audio" (ICML 2020)

BSD 3-Clause "New" or "Revised" License

119 stars 16 forks source link

Sorry for the late reply. Here's a zipped waveform sample of the trained model with 128 residual channels. I guess it's similar to their results.

Since I'm currently a graduate student, there's no resource for the deployment server. You may need to clone the repository and try training the model with the released code.

Currently, the model (with height=8 and 64 channels) hits ~93kHz from V100 which is around 2x slower than the results from the paper I believe. Not sure why, but maybe there remain some redundant ops from the current implementation.

If the sampling speed is the primary target, you may be interested in LPCNet from Mozilla which also provided heavily optimized codebase.

L0SG / WaveFlow

online speech syntensize and server code #1