NVIDIA / flowtron

Flowtron is an auto-regressive flow-based generative network for text to speech synthesis with control over speech variation and style transfer
https://nv-adlr.github.io/Flowtron
Apache License 2.0
889 stars 177 forks source link

Unintelligible speech - inference on pre-trained models #19

Closed adrianastan closed 4 years ago

adrianastan commented 4 years ago

I am trying to synthesise audio starting from the available pre-trained models

python3 inference.py -c config.json -f models/flowtron_ljs.pt -w models/waveglow_256channels_universal_v4.pt -t "Hey hello there" -o output_synth/ -i 0

but the output is not intelligible:

https://drive.google.com/file/d/1bWpbnMoRF5lm5RYwxZNj8bomiY_WF3mA/view?usp=sharing

The alignment also looks off:

sid0_sigma0 5_attnlayer0

sid0_sigma0 5_attnlayer1

I tried with both LJS and LibriTTS models.

Any idea why this happens?

Thanks!

rafaelvalle commented 4 years ago

Add a period to the end of your sentence.

adrianastan commented 4 years ago

Tried that right after running the first inference, same result: https://drive.google.com/file/d/1MHpx4GZpz8A8slnYNoQ8bJmka4Qq8BGf/view?usp=sharing

Slightly different sounds, but still not intelligible.

rafaelvalle commented 4 years ago

Pull from master and run the code below.

python inference.py -c config.json -p model_config.n_speakers=1 data_config.p_arpabet=1.0 -f models/flowtron_ljs.pt -w models/waveglow_256channels_universal_v4.pt -t "Hey. Hello there." -o output_synth/ -i 0 -s 0.5 -g 0.7

adrianastan commented 4 years ago

Great, that worked! Thank you very much.