TensorSpeech / TensorFlowTTS

:stuck_out_tongue_closed_eyes: TensorFlowTTS: Real-Time State-of-the-art Speech Synthesis for Tensorflow 2 (supported including English, French, Korean, Chinese, German and Easy to adapt for other languages)
https://tensorspeech.github.io/TensorFlowTTS/
Apache License 2.0
3.85k stars 815 forks source link

anyone convert MelGAN to tflite format? #464

Closed panyongfeng closed 3 years ago

panyongfeng commented 3 years ago

Dears, anyone had done this? from MelGan Generator h5 to tflite. thanks

dathudeptrai commented 3 years ago

@panyongfeng can you refer this notebook (https://github.com/TensorSpeech/TensorFlowTTS/blob/master/notebooks/Parallel_WaveGAN_TFLite.ipynb) first ?

panyongfeng commented 3 years ago

Dear Dathudeptrai, thanks for the inforamtion, and now i can convert the melgan model to tflite format. Then I am facing another problem: the tflite models(fastspeech2 and melgan converted from same H5 weights) generated wav files is not as good as from the model loading H5 weight, there is background noise in the tflite gen wave. I attached the wave from fastspeech2 tflite + melgan tflite, and the wave from "End-to-End examples" section on https://github.com/TensorSpeech/TensorFlowTTS Is there any way to remove the background noise? or how to make the tflite gen waves as good as the ones gen from H5 weight models? thanks a lot.

Here is the wave link: URL: https://pan.baidu.com/s/1ErKESQ_zyvmCqhPGuqlUrQ
access code: 2j2x

dathudeptrai commented 3 years ago

i think u are using quantization, please use "float16" to convert ur melgan. :D

panyongfeng commented 3 years ago

You are right, I will try to use float16 later. as far as I know, Quant model is faster about 2-4 times than float32 model running on mobile device like iphone(also 4 times smaller). So if there anyway to balance the good wav result and the model speed. :D