bootphon / phonemizer

Simple text to phones converter for multiple languages
https://bootphon.github.io/phonemizer/
GNU General Public License v3.0
1.23k stars 172 forks source link

Why not word separation for mbrola? #57

Closed olafthiele closed 3 years ago

olafthiele commented 3 years ago

I am trying to use the mbrola extension, thanks for that, but I have troubles with words being concatenated. I this because the time was missing to implement that or is that some sort of mbrola issue?

mmmaat commented 3 years ago

Hi, this is a limitation of espeak, not phonemizer. Here an example of raw output we get for standard espeak, as implemented in the phonemizer:

$ echo 'hello world' | espeak-ng -v en-gb -q -x --ipa --sep=_
h_ə_l_ˈəʊ w_ˈɜː_l_d

And for mbrola:

$ echo 'hello world' | espeak-ng -v mb-en1 -q --pho
h   70
@   24   0 94 20 95 40 96 59 97 80 99 100 99
l   65
@U  61   0 117 80 109 100 109
w   65
3:  96   0 102 80 76 100 76
5   65
d   65
_   301
_   1

But actually this is a bit more complicated... with the -x --sep_ options we get word separation but not a correct SAMPA output:

$ echo 'hello world' | espeak-ng -v mb-en1 -q  -x --sep=_
h_@_l_'oU w_'3:_l_d

During the development of mbrola backend I tried to mix the options --pho , and -x --sep=_, then align the correct SAMPA output with the word separated one, but sometimes the number of phonemes does not match between the two and the alignment become quite difficult...

$ echo 'hello world' | espeak-ng -v mb-en1 -q --pho  -x --sep=_
h_@_l_'oU w_'3:_l_d
h   70
@   24   0 94 20 95 40 96 59 97 80 99 100 99
l   65
@U  61   0 117 80 109 100 109
w   65
3:  96   0 102 80 76 100 76
5   65
d   65
_   301
_   1
olafthiele commented 3 years ago

Thanks, that clears up a lot. Sorry, another noob question. Could we feed just one word at a time? Even though though we loose info on n-grams and it takes a bit longer.

mmmaat commented 3 years ago

Yes you can but you may loose the pronunciation of words transitions (I'm not a linguist I forgot the exact term...)

olafthiele commented 3 years ago

Thanks again, in ASR you would call that n-gram and there might be some problems connected with. Have to think a bit about it for my German use case. Will try to hack something. If I come up with somethin useful, I'll let you know.