bootphon / phonemizer

Simple text to phones converter for multiple languages
https://bootphon.github.io/phonemizer/
GNU General Public License v3.0
1.18k stars 165 forks source link

phonemize a text which has more than 100K utterances #93

Closed marianasignal closed 2 years ago

marianasignal commented 2 years ago

Describe the bug A clear and concise description of what the bug is.

Phonemizer version The output of phonemize --version from command line, very helpfull!

System Your OS (Linux distribution, Windows, ...), eventually Python version.

To reproduce A short example (Python script or command) reproducing the bug.

Expected behavior A clear and concise description of what you expected to happen.

Additional context Add any other context about the problem here.

marianasignal commented 2 years ago

Describe the bug

When phonemizing a text whick has more than 100k utterances, it will always gives a "RuntimeError" include "espeak not installed on your system",“failed to find espeak library” and "invalid voice code 'cmn' " at around 900 utterances.

Phonemizer version phonemizer-3.0 available backends: espeak-ng-1.49.2, espeak-mbrola, festival-2.5.0, segments-2.2.0

System cat /proc/version: Linux version 4.15.0-106-generic (buildd@lcy01-amd64-016) (gcc version 7.5.0 (Ubuntu 7.5.0-3ubuntu1~18.04))

python: Python 3.9.1 (default, Dec 11 2020, 14:32:07) [GCC 7.3.0] :: Anaconda, Inc. on linux

To reproduce txtdict = txt2dict(text_path) with open(scp_path) as f: for line in f.readlines(): txt = txtdict.get(line[0]) phone = phonemize(txt, backend='espeak', language='cmn', separator=Separator(word='/', phone=' ', syllable="-")) rows.append([wav, new_wav, txt, phone, new_phone])

Expected behavior