Tacotron2 GST fails for short sentences

NVIDIA / OpenSeq2Seq

Toolkit for efficient experimentation with Speech Recognition, Text2Speech and NLP

https://nvidia.github.io/OpenSeq2Seq

Apache License 2.0

1.54k stars 371 forks source link

Tacotron2 GST fails for short sentences #441

Closed mrgloom closed 5 years ago

mrgloom commented 5 years ago

Looks like tacotron_gst produce bad results for short sentences: time python run.py --config_file=example_configs/text2speech/tacotron_gst.py --mode=infer --infer_output_file=unused

I have tried:

Hello!
Hello, how are you?

Why tacotron2 gst have this limitations? How it can be fixed?

samples.zip

blisc commented 5 years ago

As a general rule of thumb if tacotron (without gst) does not work, then tacotron_gst probably won't work. It is possible that the style wav does not work well with the gst model. Other that that, I am not too sure. Conditional speech synthesis is a more difficult problem than unconditional speech synthesis