Open RichardGale opened 6 years ago
Pitch can be adjusted with the following Intonation features.
mimic --setf int_f0_target_mean=120 -t "hello"
mimic --setf int_f0_target_stddev=0 --setf int_f0_target_mean=120 -t "hello"
The voice pitch varies less when you reduce the stddev.
This works with most of the voices, with consistency on the pitch being sung. (for me, rms
won't change pitch and awb_time
just doesn't work.)
Shouldn't be too difficult adding adjusting this feature with ssml https://github.com/MycroftAI/mimic/blob/master/src/synth/cst_ssml.c.
What might be more difficult is mapping these numbers to something easier to use as a musician. ~(~
SSML standard suggests Hertz. int_f0_target_mean
seems to be something like percent - 50, 100, 200, 400 all seem to be an octave apart).Perhaps we could relate a function from hertz to the int_f0_target
.
Edit: int_f0_target_mean
appears to be Hertz already.
Google and Cepstral both define their standards with relative semitones.
The following example uses the
element to speak slowly at 2 semitones lower than normal: <prosody rate="slow" pitch="-2st">Can you hear me now?</prosody>
Pitch is supported since #168 was merged
Your specific example is a little more complicated
What you want to first do is some pre-processing (perhaps with lexconvert.py?) to split your words into syllables. And you'd also need to convert the pitch name to frequency (using a table like this).
<prosody pitch='349.23Hz' range='0'><phoneme ph="D EY">Dai</phoneme>
<prosody pitch='293.66' range='0'><phoneme ph='S IY'>sy</phoneme>
Then you also want https://github.com/MycroftAI/mimic1/issues/190 to be supported for duration
Pitch is supported since #168 was merged
Your specific example is a little more complicated
What you want to first do is some pre-processing (perhaps with lexconvert.py?) to split your words into syllables. And you'd also need to convert the pitch name to frequency (using a table like this).
<prosody pitch='349.23Hz' range='0'><phoneme ph="D EY">Dai</phoneme> <prosody pitch='293.66' range='0'><phoneme ph='S IY'>sy</phoneme>
Then you also want #190 to be supported for duration In which directory I can find
lexconvert.py
? I installed mimic in /home/pi/mimic1 in a Raspberry Pi
On June 24, 2020 10:26:26 AM EDT, shyam3089 notifications@github.com wrote:
Pitch is supported since #168 was merged
Your specific example is a little more complicated
What you want to first do is some pre-processing (perhaps with lexconvert.py?) to split your words into syllables. And you'd also need to convert the pitch name to frequency (using a table like this).
<prosody pitch='349.23Hz' range='0'><phoneme ph="D EY">Dai</phoneme> <prosody pitch='293.66' range='0'><phoneme ph='S IY'>sy</phoneme>
Then you also want #190 to be supported for duration In which directory I can find
lexconvert.py
? I installed mimic in /home/pi/mimic1 in a Raspberry Pi
It's an external project. Homepage is https://ssb22.user.srcf.net/gradint/lexconvert.html
(You can wget https://ssb22.user.srcf.net/gradint/lexconvert.py
)
I'm looking to switch from Festival to a derivative of flite and looking for something with equivalent functionality to the PITCH markup, e.g.
DURATION BEATS="1.0,1.0" PITCH NOTE="F4,D4" Daisy PITCH DURATION DURATION BEATS="1.0,1.0" PITCH NOTE="Bb3,F3" Daisy PITCH DURATION
[edit: stripped some XML characters]
from the Festival examples. I see that flite has support for local volume and rate changes via SSML attributes on the PROSODY tag. Is the pitch attribute on the road map? Or even possible?
I see Sinsy, also based on flite, has the ability to change pitch however it does it through markup via labels which seems something entirely different to the PROSODY tags.