TensorSpeech / TensorFlowTTS

:stuck_out_tongue_closed_eyes: TensorFlowTTS: Real-Time State-of-the-art Speech Synthesis for Tensorflow 2 (supported including English, French, Korean, Chinese, German and Easy to adapt for other languages)
https://tensorspeech.github.io/TensorFlowTTS/
Apache License 2.0
3.8k stars 810 forks source link

training mb_melgan with baker gets noise at the end of sample #713

Closed ttsking closed 2 years ago

ttsking commented 2 years ago

when i train the mb_melgan with baker dataset (offical or my own dataset), i found if there has silence at the end of sample, then the result become noise.

for example: b'7094'-marked and: b'7849'-marked

This issue won't affact the silence in middle: b'6887'

i use the standard baker_preprocess.yaml with "trim_silence: true" Is there any thing can be done to improve the results?

dathudeptrai commented 2 years ago

@ttsking It is not a problem in a real inference process since you know the length of the mel then you know exactly the length of audio generated then you can remove these noises at the end (the noise you hear corresponding to a padding mel in training)

ttsking commented 2 years ago

Ok, Thanks for your comments.