Closed olafthiele closed 3 years ago
Hi, this is a limitation of espeak, not phonemizer. Here an example of raw output we get for standard espeak, as implemented in the phonemizer:
$ echo 'hello world' | espeak-ng -v en-gb -q -x --ipa --sep=_
h_ə_l_ˈəʊ w_ˈɜː_l_d
And for mbrola:
$ echo 'hello world' | espeak-ng -v mb-en1 -q --pho
h 70
@ 24 0 94 20 95 40 96 59 97 80 99 100 99
l 65
@U 61 0 117 80 109 100 109
w 65
3: 96 0 102 80 76 100 76
5 65
d 65
_ 301
_ 1
But actually this is a bit more complicated... with the -x --sep_
options we get word separation but not a correct SAMPA output:
$ echo 'hello world' | espeak-ng -v mb-en1 -q -x --sep=_
h_@_l_'oU w_'3:_l_d
During the development of mbrola backend I tried to mix the options --pho
, and -x --sep=_
, then align the correct SAMPA output with the word separated one, but sometimes the number of phonemes does not match between the two and the alignment become quite difficult...
$ echo 'hello world' | espeak-ng -v mb-en1 -q --pho -x --sep=_
h_@_l_'oU w_'3:_l_d
h 70
@ 24 0 94 20 95 40 96 59 97 80 99 100 99
l 65
@U 61 0 117 80 109 100 109
w 65
3: 96 0 102 80 76 100 76
5 65
d 65
_ 301
_ 1
Thanks, that clears up a lot. Sorry, another noob question. Could we feed just one word at a time? Even though though we loose info on n-grams and it takes a bit longer.
Yes you can but you may loose the pronunciation of words transitions (I'm not a linguist I forgot the exact term...)
Thanks again, in ASR you would call that n-gram and there might be some problems connected with. Have to think a bit about it for my German use case. Will try to hack something. If I come up with somethin useful, I'll let you know.
I am trying to use the mbrola extension, thanks for that, but I have troubles with words being concatenated. I this because the time was missing to implement that or is that some sort of mbrola issue?