bootphon / phonemizer

Simple text to phones converter for multiple languages
https://bootphon.github.io/phonemizer/
GNU General Public License v3.0
1.19k stars 166 forks source link

Does not produce phonemes for few hindi words #63

Closed john-2424 closed 3 years ago

john-2424 commented 3 years ago

Using espeak-ng and for Hindi words such as तूफ़ान and आठ, it does not produce phonemes. What is the problem and please help me, to resolve this problem? There might be even others words that it might not process and produce the phonemes for!

Code: text = "तूफ़ान" print(phonemize(text, language="hi", backend="espeak"))

No error, no output, just keeps executing!

mmmaat commented 3 years ago

Hi, What version of espeak-ng are you using? For me it works:

$ phonemize --version
phonemizer-2.2.1
available backends: espeak-ng-1.50, espeak-mbrola, festival-2.5.0, segments-2.1.3
$ echo 'तूफ़ान' | phonemize -b espeak -l hi
tuːfaːn 
$ echo 'आठ' | phonemize -b espeak -l hi
aːʈʰ
john-2424 commented 3 years ago

@mmmaat This is the version of phonemizer and espeak-ng:

!pip install phonemizer !sudo apt-get install festival espeak-ng mbrola

!phonemize --version

phonemizer-2.2.2 available backends: espeak-ng-1.49.2, espeak-mbrola, festival-2.5.0, segments-2.2.0

I tried to install the newer version of espeak-ng, there is an error while installing the version espeak-ng-1.50

E: Unable to locate package espeak-ng=

mmmaat commented 3 years ago

Ok... you are on an old version of ubuntu I guess (on ubuntu-20.04 the version of espeak-ng is 1.50). So one solution is to manually download, compile and install espeak-ng-1.50 on your system. See https://github.com/espeak-ng/espeak-ng/blob/master/docs/building.md#linux-mac-bsd. Once installed you can use the environment variable PHONEMIZER_ESPEAK_PATH to specify where the new espeak-ng is installed.

john-2424 commented 3 years ago

Thank You, I will try this on ubuntu, but what if I want to use espeak-ng on google colab? Also the espeak-ng that I have installed was on colab and the version shared by me earlier was on the colab and not on ubuntu.

mmmaat commented 3 years ago

I don't know I never used colab...

123srikanth commented 2 years ago

Hi there, I have created data folder for espnet/egs/ljspeech/tts1 model for hindi language where I have created the text files inside data folder with the phonemes generated using the phonemizer for each corresponding line of text. This is how I have seen the data folder which is created for default data in the model with trans_type as phn. So in the same manner I have created for my own data. I'm facing a lot of errors in espnet2/egs/ljspeech/tts model so I want go with this method. Can you please suggest if it is a right method and is there a possibility that it will work so that I can proceed with my training.It has successfully ran all the pre training stages and before proceeding further I just wanted confirmation of what I'm doing is right. And if I can proceed then I have generated phonemes using phonemizer and phonemizer_EspeakBackend where both of them seem to differ slightly can you please clarify which phonemes I can use whether generated of phonemizer or phonemizer_EspeakBackend.

/home/srikanth/Documents/train_data/data_phonemizer.zip /home/srikanth/Documents/train_data/data_phonemizer_espeakbackend.zip