Voice samples from https://mycroftai.github.io/mimic3-voices/ sound completely different

MycroftAI / mimic3-voices

Voice models for Mimic 3 text to speech system

Creative Commons Attribution Share Alike 4.0 International

121 stars 28 forks source link

Voice samples from https://mycroftai.github.io/mimic3-voices/ sound completely different #5

Closed distbit0 closed 1 year ago

distbit0 commented 1 year ago

Describe the bug At least for the cmu-arctic_low voices (which are the only ones I have tried), the samples given at https://mycroftai.github.io/mimic3-voices/ are completely different to what the voices sound like when I run them from my terminal. For example I really like the sample of cmu-arctic_low#7 but when I attempt to use either -- voice cmu-arctic_low#7 or cmu-arctic_low#lnh, they sound super mumbled and unclear relative to the clarity of the sample: https://mycroftai.github.io/mimic3-voices/samples/en_US/cmu-arctic_low/sample_lnh.wav

The command I am using is bash -c "mimic3 --voice en_US/cmu-arctic_low#7 --interactive \"$(xclip -selection primary -out)\"" I am running mimic3 0.2.4

Expected behavior I expected the samples to match the vices I hear when running mimic3 form the terminal

Environment (please complete the following information):

Device type: Asus laptop
OS: Ubuntu
Mycroft-core version: ?
Other versions: ?

synesthesiam commented 1 year ago

Make sure you are adding punctuation. It's really important that sentences end with a period.

distbit0 commented 1 year ago

I am 100% sure all sentences had punctuation. But the difference was not in the intonation, it sounded like a completely different voice. @synesthesiam

Thx for replying btw :) Would be great to get this working, as the voices on the website sound great.

synesthesiam commented 1 year ago

Np, I'll try to help. Can you try deleting the voice cache in $HOME/.local/share/mycroft/mimic3/voices/en_US/cmu-arctic_low and see if that helps?

Did you install from source or another way?

distbit0 commented 1 year ago

Ok yeah that seems to have fixed it! Although the voices still sound much worse when I make length-scale = 0.3 than they do when I simply play the audio at mycroftai.github.io/mimic3-voices at 3.33x speed.

distbit0 commented 1 year ago

Do you know of any way I can increase the playback speed without causing significant quality degradation, by any chance? @synesthesiam many thx

synesthesiam commented 1 year ago

I've had good luck with the sonic utility (apt-get install sonic). It lets you adjust playback speed without too much pitch distortion. I believe this is what espeak-ng uses too.

If you adjust the length scale to go faster, I'd also recommend trying to lower the noise scale or noise w.

distbit0 commented 1 year ago

Thanks @synesthesiam this is great. Do you know if there is any way to increase audio speed w/o waiting for the audio generation to end? It seems sonic can not have audio data piped to it. Many thx :)

synesthesiam commented 1 year ago

This appears to be just a limitation of the sonic CLI, but not of sonic itself. Any chance you know C?