extract duration for fastspeech2

TensorSpeech / TensorFlowTTS

:stuck_out_tongue_closed_eyes: TensorFlowTTS: Real-Time State-of-the-art Speech Synthesis for Tensorflow 2 (supported including English, French, Korean, Chinese, German and Easy to adapt for other languages)

https://tensorspeech.github.io/TensorFlowTTS/

Apache License 2.0

3.8k stars 810 forks source link

extract duration for fastspeech2 #664

Closed karrdy89 closed 2 years ago

karrdy89 commented 3 years ago

Hello. Thanks for the great repo. I am trying to train a pre-trained fastspeech2 model(kss). It seems that a duration is needed for learning, which can be extracted by training a tacotron. I have a question here. isn't the duration necessary for learning tacotron2? also, how many epoch is required for accurate extraction?

dathudeptrai commented 3 years ago

@karrdy89 you need training tacotron2 to extract duration for fs2. Around 50k-80k is good enough for extract duration.

karrdy89 commented 3 years ago

I get it. Thanks for the reply :)

Pydataman commented 3 years ago

@karrdy89 you need training tacotron2 to extract duration for fs2. Around 50k-80k is good enough for extract duration.

if there is small unseen speaker data, hot to extract duration info?

stale[bot] commented 2 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.