PlayVoice / vits_chinese

Best practice TTS based on BERT and VITS with some Natural Speech Features Of Microsoft; Support ONNX streaming out!
https://huggingface.co/spaces/maxmax20160403/vits_chinese
MIT License
1.16k stars 167 forks source link

关于quantization #61

Closed pgolds closed 1 year ago

pgolds commented 1 year ago

大佬是否对量化有过研究,能达到什么样的性能

MaxMax2016 commented 1 year ago

量化在TTS中很少人用吧,这样会这导致音频噪声的吧?语音转文本做量化倒是常见

pgolds commented 1 year ago

我转成openvino FP16格式的测试了下,听感上没太大区别,CPU合成速度能提升1/3。GPU使用的话我试了转TensorRT目前有问题,encoder里有不支持的类型,不知道该如何才能转化。

MaxMax2016 commented 1 year ago

FP16确实比较好;有很多VITS项目提供了onnx模型导出,具体的我也没做过~~~~

MaxMax2016 commented 1 year ago

@pgolds 希望这个项目能解决你的问题:https://github.com/rhasspy/piper/tree/master/src/python/piper_train

pgolds commented 1 year ago

感谢,我看看