bootphon / phonemizer

Simple text to phones converter for multiple languages
https://bootphon.github.io/phonemizer/
GNU General Public License v3.0
1.15k stars 163 forks source link

Phonemize gets stuck one some sentences #126

Closed DasAnish closed 1 year ago

DasAnish commented 2 years ago

'ज्ञान' Just one word

But similar issues happen in Bengali as well but I have not looked at it closely.

hadware commented 2 years ago

What backend are you using?

Could you provide us with some more context: OS, phonemizer version, backend used, and if you're using it from the python API, some code sample to reproduce the error.

skanda1005 commented 2 years ago

Hi, I am facing a similar issue in hindi. I am running the model on Linux, with phonemizer version 3.2.1 using python API with espeak as backend

Code:

text='जासूस'
phn = phonemize(
    text,
    language='hi',
    backend='espeak',
    separator=Separator(phone=None, word=' ', syllable='|'),
    njobs=4)
intellisr commented 2 years ago

Hi, I too have this issue in Sinhala and using python API with espeak as the backend

hadware commented 2 years ago

could you give me the output of phonemize --version, as well as the exact OS you are using (which linux distribution).

intellisr commented 2 years ago

@hadware This is the result and OS is Ubuntu 18.04

phonemizer-2.2.2
available backends: espeak-ng-1.49.2, segments-2.2.0
uninstalled backends: espeak-mbrola, festival

This is my code:

phonemes = phonemize(text,
                             language=language,
                             backend='espeak',
                             strip=True,
                             preserve_punctuation=True,
                             with_stress=with_stress,
                             punctuation_marks=self.punctuation,
                             njobs=njobs,
                             language_switch='remove-flags')
hadware commented 2 years ago

OK. Could you try updating phonemizer to the latest version (3.0.1) ? I tested this error on my setup and couldn't reproduce. This is my setup:

(On ubuntu 20.04)
phonemizer-3.0.1
available backends: espeak-ng-1.50, espeak-mbrola, festival-2.5.0, segments-2.2.0

To update phonemizer, run pip install -U phonemizer

intellisr commented 2 years ago

@hadware done that but still not working.

every time it's stuck on the same line:-

භුගෝලීය වශයෙන් බැලූ කළද ශාන්ත කීටස් හා නේවිස් යනු ලීවර්ඩ් දූපත් හි කොටසක් වේ

only some texts was not working

Mihir-Gajera1 commented 1 year ago

it is getting stuck for some Hindi words. Can someone please help? Ex. रुचि , दिनांक , नौ

mmmaat commented 1 year ago

Hi, I suspect this to be an espeak bug, not a phonemizer one. So please make sure you have espeak-ng-1.50 installed.

Actually I have:

$ phonemize --version
phonemizer-3.2.1
available backends: espeak-ng-1.50, espeak-mbrola, festival-2.5.0, segments-2.1.3

And the following script works:

from phonemizer import phonemize
from phonemizer.separator import Separator

# gives 'ɟaːsuːs '
phonemize('जासूस', language='hi', backend='espeak')

# gives 'ɾʊcɪ dɪnãk nɔː '
phonemize('रुचि , दिनांक , नौ', language='hi', backend='espeak')

# gives 'bʰuɡoːliːjə wɐsəjen bæluː kɐɭədə saːntə kiːʈəs haː neːwis jɐnu liːwərɖ duːpət hi koʈəsək weː '
phonemize('භුගෝලීය වශයෙන් බැලූ කළද ශාන්ත කීටස් හා නේවිස් යනු ලීවර්ඩ් දූපත් හි කොටසක් වේ', language='si', backend='espeak')  
Mihir-Gajera1 commented 1 year ago

Hi, Thanks for the reply. This is indeed espeak-ng version issue. In centos 8 - the latest available version is 1.49.2. Building it from a source solves the issue.