Closed dsplog closed 3 months ago
Hi, thanks for reporting. Unfortunately this is related to espeak implementation, not phonemizer itself:
$ phonemize --version
phonemizer-3.2.1
available backends: espeak-ng-1.50, espeak-mbrola, festival-2.5.0, segments-2.2.1
$ echo 'ന' | espeak-ng -x -q --ipa -v en-us
mæleɪˈɑːləm(ml)nˈɐ(en-us)
$ echo 'ന' | espeak-ng -x -q --ipa -v ml
nˈɐ
$ echo 'ആനേ' | espeak-ng -x -q --ipa -v en-us
(ml)ˈaːneː(en-us)
I think this is a very special case... if you try with a word the problem is not here. I suggest you to write a custom post-process code, or to play with the regex detecting language-switches here.
Describe the bug when using the phonemizer on unicode single characters, the language name is coming as prefix
Phonemizer version home@home-desktop:$ phonemize --version phonemizer-3.2.1 available backends: espeak-ng-1.50, espeak-mbrola, festival-2.5.0, segments-2.2.1
System home@home-desktop:$ uname -a Linux home-desktop 5.15.0-88-generic #98~20.04.1-Ubuntu SMP Mon Oct 9 16:43:45 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Python 3.8.18 (default, Sep 11 2023, 13:40:15) [GCC 11.2.0] :: Anaconda, Inc. on linux
To reproduce
Expected behavior the prefix 'mæleɪˈɑːləm' is not expected. is there a way to supress it btw, if i initialize the language as 'ml', the prefix is not there
Additional context looks like the language_switch is not taking care of single characters