bootphon / phonemizer

Simple text to phones converter for multiple languages
https://bootphon.github.io/phonemizer/
GNU General Public License v3.0
1.18k stars 165 forks source link

Unknown phonemes with espeak-mbrola and japanese voices #73

Closed itsupera closed 3 years ago

itsupera commented 3 years ago

Describe the bug With the espeak-mbrola backend and japanese voices, some generated phonemes are "unknown" according to espeak-ng. This happens for mb-jp1, mb-jp2 and mb-jp3. mb-jp2 seems to behave a bit better.

Phonemizer version

phonemizer-2.2.2
available backends: espeak-ng-1.50, espeak-mbrola, segments-2.2.0
uninstalled backends: festival

System

To reproduce

$ echo "めそっど" | phonemize -b espeak-mbrola -l mb-jp1 -p ' '
m e s o d d o
$ espeak-ng -v mb/mb-jp1 "[[m e s o d d o]]"   # similar issue for mb-jp2 and mb-jp3
mbrola: Warning: d-d unkown, replaced with _-_

$ echo "りかい" | phonemize -b espeak-mbrola -l mb-jp1 -p ' '
r i k a i
$ espeak-ng -v mb/mb-jp1 "[[r i k a i]]"   # works with mb-jp2, same issue for mb-jp3
mbrola: Warning: _-r unkown, replaced with _-_
mbrola: Warning: r-i unkown, replaced with _-_

Expected behavior The generated transcription should not cause "unknown" phonemes with espeak-ng.

mmmaat commented 3 years ago

Hi, I don't clearly understand the issue here. This is not related to phonemizer (or to espeak phonemization), but to espeak synthesis... In your exemple you are trying to synthetize some phonemized output, but the bug comes from synthetization process.

By the way, I got the same bug by directly synthetize the orhographic form:

$ espeak-ng -v mb/mb-jp1 "めそっど" 
mbrola: Warning: d-d unkown, replaced with _-_
itsupera commented 2 years ago

Ok, I was not sure whether the phonemization or the synthetization was wrong.

Thank you for clarifying !