MycroftAI / mimic3

A fast local neural text to speech engine for Mycroft
GNU Affero General Public License v3.0
1.08k stars 103 forks source link

Only word choice affects whether a question is spoken like a question. #43

Open Tynach opened 1 year ago

Tynach commented 1 year ago

Describe the bug I had initially tried to have it say simple things like, "Hello. Hello? I did that. I did? No? No. No?!" and so on, to try to hear variation in how text was spoken; like, a ramp toward a slightly higher pitch toward the end of a sentence, that sort of thing. However, I heard practically no variation in the default en_UK/apope_low voice no matter what I tried, so assumed it just couldn't make different sounds for the same words.

However, after discovering that the non-default voices would always speak slightly differently every time I'd have them say the same thing, I started doing more testing.. And found that even with --noise-scale 0 --noise-w 0 I could get these voices to have things like that ramped up pitch at the end if I unambiguously worded a sentence like a question to begin with.

This seems most consistent with the en_US/ljspeech_low voice. The others often do sound like they're saying a question, but it's ambiguous. This.. Works, but not well.

To Reproduce Compare the output audio for the following commands:

  1. mimic3 -v en_US/ljspeech_low --noise-scale 0 --noise-w 0 'Where was it?'
  2. mimic3 -v en_US/ljspeech_low --noise-scale 0 --noise-w 0 'Where was it.'
  3. mimic3 -v en_US/ljspeech_low --noise-scale 0 --noise-w 0 'That was it?'
  4. mimic3 -v en_US/ljspeech_low --noise-scale 0 --noise-w 0 'That was it.'

Expected behavior The 'was it' at the end of commands 1 and 3 above should be spoken as if they were questions. The 'was it' at the end of commands 2 and 4 should be spoken as if they were statements.

Instead, it speaks 1 and 2 completely identically, as if both of them are questions. It speaks 3 and 4 identically as well, but as if they are both statements.

Environment (please complete the following information):