bootphon / phonemizer

Simple text to phones converter for multiple languages
https://bootphon.github.io/phonemizer/
GNU General Public License v3.0
1.21k stars 168 forks source link

EspeakBackend enters a corrupted state upon seeing some characters #133

Open CorentinJ opened 2 years ago

CorentinJ commented 2 years ago

Describe the bug When calling phonemize on an instance of EspeakBackend with the character "ꪁ", the backend enters a corrupted state where all succeeding phonemization (including in the sentence with "ꪁ") is incorrect.

Phonemizer version Phonemizer 3.2.1 Espeak NG 1.50

System Reproduced the bug both on Win10 and Ubuntu

To reproduce

from phonemizer.backend import EspeakBackend

texts = [
    "a, b, c, d, e, f, p, w, y, z",
    "ꪁ",
    "a, b, c, d, e, f, p, w, y, z"
]

backend = EspeakBackend(
    language="en-us", preserve_punctuation=True, with_stress=True,
    language_switch="remove-flags", words_mismatch="ignore"
)

for text in texts:
    print(backend.phonemize([text])[0])

Expected behavior Expected output:

ˈeɪ , bˈiː , sˈiː , dˈiː , ˈiː , ˈɛf , pˈiː , dˈʌbəljˌuː , wˈaɪ , zˈiː 

ˈeɪ , bˈiː , sˈiː , dˈiː , ˈiː , ˈɛf , pˈiː , dˈʌbəljˌuː , wˈaɪ , zˈiː

Actual output:

ˈeɪ , bˈiː , sˈiː , dˈiː , ˈiː , ˈɛf , pˈiː , dˈʌbəljˌuː , wˈaɪ , zˈiː 

ˈʌ , bˈʌ , sˈʌ , dˈʌ , ˈʌ , ˈʌf , pˈʌ , dˈʌbd-jʌ , wˈʌ , zˈʌ 
CorentinJ commented 2 years ago

I have tried to reproduce the issue with espeak only, but it seems to be doing ok:

espeak-ng.exe -qx --ipa
>>> a, b, c, d, e, f, p, w, y, z
ˈeɪ
bˈiː
sˈiː
dˈiː
ˈiː
ˈɛf
pˈiː
dˈʌbəljˌuː
wˈaɪ
zˈɛd
>>> ꪁ

>>> a, b, c, d, e, f, p, w, y, z
ˈeɪ
bˈiː
sˈiː
dˈiː
ˈiː
ˈɛf
pˈiː
dˈʌbəljˌuː
wˈaɪ
zˈɛd