bootphon / phonemizer

Simple text to phones converter for multiple languages
https://bootphon.github.io/phonemizer/
GNU General Public License v3.0
1.18k stars 165 forks source link

Problems about Mandarin phoneme #90

Closed mynah15 closed 1 year ago

mynah15 commented 2 years ago

When run phonemize with -l cmn or zh, the phoneme is International Phonetic Alphabet instead of Mandarin phoneme

Is this a code problem or a bug? How can i convert Chinese to Mandarin phoneme? Look forward to your reply, thanks!

Expected behavior I try to convert Chinese to Mandarin phoneme with this code:

cat chinese.txt | PHONEMIZER_ESPEAK_PATH=$(which espeak) phonemize -o train_out.phn -p ' ' -w '' -l zh -j 70 --language-switch remove-flags

result: fatal error: language "zh" is not supported by the espeak backend then i check: espeak --voices it include Mandarin phonemize --list-languages it said:cmn -> Chinese (Mandarin)

Phonemizer version phonemizer-3.0 espeak-ng-1.50, espeak-mbrola, festival-2.5.0, segments-2.2.0

System ubuntu 20.04

mmmaat commented 2 years ago

Hi, if you use phonemizer with the cmn language flag do you have the result you expect? This is possible that "zh" for espeak is viewed as "cmn" by phonemizer (because of that).

Finally I don't know what is the Mandarin phoneme set you are expecting, is it different from IPA? If so, you cannot get here because espeak (and so phonemizer) only outputs IPA phonemes.

marianasignal commented 2 years ago

hi,I wonder what' s the content of your input. Is it piyin(ni3 hao3) or character(你好)?

And this is my example:

image

which one is correct if i only need to get the position of a phone?

mynah15 commented 2 years ago

Hi, if you use phonemizer with the cmn language flag do you have the result you expect? This is possible that "zh" for espeak is viewed as "cmn" by phonemizer (because of that).

Finally I don't know what is the Mandarin phoneme set you are expecting, is it different from IPA? If so, you cannot get here because espeak (and so phonemizer) only outputs IPA phonemes.

Thanks for your help!Mandarin phoneme is the basic unit of pronunciation,can be used for training HMM-GMM model, the pronunciation sounds like IPA but not exactly the same. So use the IPA pronunciation sounds like dialect and even hard to understand. I found the code(cat chinese.txt | PHONEMIZER...) on the Internet,some one got the desired result, and said phonemize worked very well. As you said if espeak and phonemizer only outputs IPA phonemes, I may not get the right result in this way.

mynah15 commented 2 years ago

hi,I wonder what' s the content of your input. Is it piyin(ni3 hao3) or character(你好)? And this is my example:

image

which one is correct if i only need to get the position of a phone?

Thank you for your reply, I try to input character(你好), hope to get phoneme, eg(你好 --> n i3 h ao3). The pinyin(ni3 hao3)is different form phoneme, because phoneme is the basic unit of pronunciation,and pinyin is the combination of phonemes. Take an inappropriate example, hello phonetic symbol is [həˈləʊ] phoneme seems like [h ə1 l əʊ5] On the Internet,some one got the desired result, I try to use his code (cat chinese.txt | PHONEMIZER....) but it did not compile successfully. Do you know how to get the correct phoneme? Thanks.

mmmaat commented 2 years ago

Maybe the source where you fount this exemple on Internet explains which version of the phonemizer they use? You could try with phonemizer-2.2.2, as the 3.0 changed a lot of things. This is not unlikely that the cmn/zh difference comes from that.

mynah15 commented 2 years ago

Maybe the source where you fount this exemple on Internet explains which version of the phonemizer they use? You could try with phonemizer-2.2.2, as the 3.0 changed a lot of things. This is not unlikely that the cmn/zh difference comes from that.

Thanks, I can't find information about the version of the phonemizer they use I will try to use different version phonemize and espeak.