bootphon / phonemizer

Simple text to phones converter for multiple languages
https://bootphon.github.io/phonemizer/
GNU General Public License v3.0
1.15k stars 163 forks source link

Error when phonemizing the word "wherever" in python3.8 environment #138

Closed dmitrii-obukhov closed 1 year ago

dmitrii-obukhov commented 1 year ago

Describe the bug When trying to phonemize sentences with the word wherever, the result is incorrect and not determined. Sometimes it crashes with a segmentation fault. The error occurs when using python3.8. When using python3.7 the error did not occur.

Phonemizer version

phonemizer-3.0.1
available backends: espeak-ng-1.49.2, segments-2.2.1
uninstalled backends: espeak-mbrola, festival

System Operating System: Amazon Linux 2 Kernel: Linux 4.14.268-205.500.amzn2.x86_64 Architecture: x86-64 Python 3.8.13

To reproduce

# setup environment
# (I assume that espeak is already installed, version 1.49.2)
python3 -m venv venv
 . venv/bin/activate
pip install phonemizer==3.0.1

# reproduce an error
echo "wherever" | phonemize
echo "wherever you are" | phonemize

Expected behavior Output should be:

echo "wherever" | phonemize
wɛɹɛvɚ 

echo "wherever you are" | phonemize
wɛɹɛvɚ juː ɑːɹ 

Actual behavior Actual outputs:

echo "wherever" | phonemize
[WARNING] words count mismatch on 100.0% of the lines (1/1)
wɛɹɛvɚɹ ʌ ziəɹoʊ sɪks

echo "wherever" | phonemize
[WARNING] words count mismatch on 100.0% of the lines (1/1)
wɛɹɛvɚɹ ʌ 

echo "wherever" | phonemize
[WARNING] words count mismatch on 100.0% of the lines (1/1)
wɛɹɛvɚɹ ʌ naɪn

echo "wherever your are" | phonemize
[WARNING] words count mismatch on 100.0% of the lines (1/1)
wɛɹɛvɚ j m m m m m m m m jʊɹ ɑːɹ 

echo "wherever your are" | phonemize
[WARNING] 1 utterances containing language switches on lines 1
[WARNING] extra phones may appear in the "en-us" phoneset
[WARNING] language switch flags have been kept (applying "keep-flags" policy)
[WARNING] words count mismatch on 100.0% of the lines (1/1)
wɛɹɾɛvõ ʌ(gn) æ ziəɾoʊ jə ə 

echo "wherever your are" | phonemize
*** stack smashing detected ***: <unknown> terminated
Aborted

Additional context When using python3.7 the error is not reproduced. When using phonemizer 3.2.1, the error also happens, but less often. No errors were found in the phonemization of other words.

dmitrii-obukhov commented 1 year ago

It seems to be an espeak issue that is being discussed here.

mmmaat commented 1 year ago

Thanks for reporting, indeed this is related to espeak, not phonemizer.