bootphon / phonemizer

Simple text to phones converter for multiple languages
https://bootphon.github.io/phonemizer/
GNU General Public License v3.0
1.21k stars 168 forks source link

Chinese Mandarin, incoherent output and stack smashing #136

Closed pasinit closed 2 years ago

pasinit commented 2 years ago

Describe the bug I am trying to run espeak backend on Chinese Mandarin, however, I am getting different results when feeding the same input multiple times and eventually got stack smashing detected .

Phonemizer version 3.0.1

System Ubuntu 18.04

To reproduce

from phonemizer.punctuation import Punctuation
from phonemizer.backend import EspeakBackend
backend = EspeakBackend(
                'cmn',
                punctuation_marks=Punctuation.default_marks(),
                preserve_punctuation=False,
                with_stress=False,
                tie=False,
                language_switch='keep-flags',
                words_mismatch='ignore',
                )
backend.phonemize(['相'])
# ['ɕiɑŋji2() i2 ']
backend.phonemize(['相'])
# ['əəəəəə ']
backend.phonemize(['相'])
# *** stack smashing detected ***: <unknown> terminated
# Aborted

Expected behavior All three call should return the same output.

Additional context image

mmmaat commented 2 years ago

Hi, I cannot reproduce your bug. I got a consistent output ɕiɑ5ŋ from . I'm using:

$ phonemize --version
phonemizer-3.2.0
available backends: espeak-ng-1.50, espeak-mbrola, festival-2.5.0, segments-2.2.0

I guess your are using an old version of espeak... Maybe try to upgrade to phonemizer-3.2 and espeak-ng-1.50 ?

pasinit commented 2 years ago

Because Ubuntu version is 18.04 i am stuck with espeak-ng-1.49.2+dfsg-1

I upgraded phonemizer to 3.2.0 but now the outputs i get are the following:

backend.phonemize(['相'])
> ['ɕɡŋjits(kl) ɛ ɛ ']
backend.phonemize(['相'])
> ['ɕəɣjiʌ(kl) ɛ ɛ ']
backend.phonemize(['相'])
> ['ɕəɣjiʌ(kl) ɛ ɛ ']
backend.phonemize(['相'])
> ['ɕəɣjiʌ(kl) ɛ ɛ ']
backend.phonemize(['相'])
> ['ɕəɣjits(kl) ɛ ɛ ɛ ']
backend.phonemize(['相'])
> ['ɕəɣjits(kl) ɛ ɛ ']
backend.phonemize(['相'])
> ['ɕəɣjits(kl) ɛ ɛ ']

which does not look right and also varies across different runs...

this is the output for phonemize --version

phonemize --version
phonemizer-3.2.0
available backends: espeak-ng-1.49.2, festival-2.5.0, segments-2.2.0
uninstalled backends: espeak-mbrola
mmmaat commented 2 years ago

To use espeak-1.50 you can either build it from sources or use the phonemizer Docker image

pasinit commented 2 years ago

A quick update, with espeak-ng-1.50 I can reproduce @mmmaat output. Note that if I compile from the most updated sources (1.52) the output is instead different and looks wrong (but still consistent at least). As future reference, I installed espeak from this link Thanks for your help @mmmaat