coqui-ai / TTS

🐸💬 - a deep learning toolkit for Text-to-Speech, battle-tested in research and production
http://coqui.ai
Mozilla Public License 2.0
33.25k stars 4.02k forks source link

[Bug] Interrogative sentences do not sound like questions #1099

Closed Ca-ressemble-a-du-fake closed 2 years ago

Ca-ressemble-a-du-fake commented 2 years ago

Hi and first of all thanks for this great project!

🐛 Description

I tried the Colab notebook "Zero-shot-TTS-Demo" in French and the results were stunning. But I could not hear a difference between "Vous en avez beaucoup!" and "Vous en avez beaucoup?" (note the ! vs ? at the end). I tried with other interrogative sentences and I could not hear a clear question being said. So it does not seem that YourTTS handles question marks.

To Reproduce

Open YourTTS-Zero-shot-TTS-demo notebook in Colab. Run all the cells, upload around 60 seconds of target speaker wav files when prompted, change the "text" variable to "Vous en avez beaucoup? Vous en avez beaucoup! Vous en avez beaucoup.", launch the synthesis and hear the result. All thee sentences sound the same.

Expected behavior

Input text featuring a question mark should be said like a question.

Environment

{ "CUDA": { "GPU": [], "available": false, "version": "10.2" }, "Packages": { "PyTorch_debug": false, "PyTorch_version": "1.9.0+cu102", "TTS": "0.2.0", "numpy": "1.19.5" }, "System": { "OS": "Linux", "architecture": [ "64bit", "" ], "processor": "x86_64", "python": "3.7.12", "version": "#1 SMP Tue Dec 7 09:58:10 PST 2021" } }

Additional context

erogol commented 2 years ago

Thanks for your feedback. It is not really a bug but I think it is important to handle such things properly. We'll consider your feedback for the next released model.

Ca-ressemble-a-du-fake commented 2 years ago

@erogol Cool! Is there already a way to pronounce a question as it is supposed to sound ? Maybe you are telling this is not possible in this Colab notebook but if I fine tuned the model with a dataset featuring questions, would it work ?

skol101 commented 2 years ago

Curious, if the model is provided enough interrogative samples , will it learn to proncounce text differently if it ends with a "?"

erogol commented 2 years ago

I close this as it does not really a bug but a feature request maybe. But we'll keep this in mind for the next release of the model.