bootphon / phonemizer

Simple text to phones converter for multiple languages
https://bootphon.github.io/phonemizer/
GNU General Public License v3.0
1.19k stars 166 forks source link

Issues with diacritics in French #58

Closed CorentinJ closed 3 years ago

CorentinJ commented 3 years ago

Hello, I found some issues with the French "à" when using the espeak backend. I'm running 2.2.1 and espeak-ng-1.49.2.

Command line:

echo "Je vais à" | phonemize -l=fr-fr -b=espeak
ʒə vɛ

The "à" is simply removed. The same goes for é, I haven't tried other characters.

Python:

>>> phonemize('Je vais à', language="fr-fr", backend="espeak")
'ʒə vɛz aaksɑ̃ɡʁav '

The behaviour isn't consistent with the above, but it is problematic in a different way: here the "à" is explicitly spelled out as "a accent grave"

mmmaat commented 3 years ago

Hi, unfortunately this is a espeak-ng related issue we cannot deal with in phonemizer... I suggest you to directly open an issue there. (I tried with espeak-1.48, espeak-ng-1.49 and espeak-ng-1.50, same result each time)

$ echo "je vais à" | phonemize -l fr-fr -b espeak
ʒə vɛz aaksɑ̃ɡʁav 
$ echo "je vais à pied" | phonemize -l fr-fr -b espeak
ʒə vɛz a pje
$ echo "ô" | phonemize -l fr-fr -b espeak
oaksɑ̃siʁkɔ̃flɛks
$ echo "ô César" | phonemize -l fr-fr -b espeak
oː sezaʁ
CorentinJ commented 3 years ago

Indeed I was able to reproduce with espeak alone and on the latest version. I filed an issue here: https://github.com/espeak-ng/espeak-ng/issues/854

mmmaat commented 3 years ago

Thank you!