bootphon / phonemizer

Simple text to phones converter for multiple languages
https://bootphon.github.io/phonemizer/
GNU General Public License v3.0
1.23k stars 172 forks source link

Festival backend ignores apostrophes #12

Closed yvt closed 6 years ago

yvt commented 6 years ago
$ echo "I'm looking for a job as a digital assistant." | phonemize
ihm luhkaxng faor ax jhaab aez ax dihjhaxtaxl axsihstaxnt

$ echo "Im looking for a job as a digital assistant." | phonemize
ihm luhkaxng faor ax jhaab aez ax dihjhaxtaxl axsihstaxnt
mmmaat commented 6 years ago

Hi,

This is not a bug but the expected behavior. The apostrophe is not displayed but pronounced... For example:

https://github.com/bootphon/phonemizer/blob/c41e5e10c35f1ce8279565f2f380dd66a59a288a/test/test_festival.py#L43-L47

yvt commented 6 years ago

So you are saying, in contrary to what most people and New Oxford American Dictionary say (I'm |aɪm|), "I'm" being pronounced as |ɪm| as in "image" and "imitate" is an expected behavior?

mmmaat commented 6 years ago

Ok thank you for pointing out that bug, because it is a bug I recently introduced... In a former version we had:

$ echo "I'm looking for an image." | phonemize
aym luhkaxng faor axn ihmaxjh
$ echo "Im looking for an image." | phonemize
ihm luhkaxng faor axn ihmaxjh 

In the latest release we have:

$ echo "I'm looking for an image." | phonemize
ihm luhkaxng faor axn ihmaxjh
$ echo "Im looking for an image." | phonemize
ihm luhkaxng faor axn ihmaxjh 

I will correct that very soon. Note that the bug concerns only the festival backend, you can still use the espeak one:

$ echo "I'm looking for an image." | phonemize -l en-gb
aɪm lʊkɪŋ fəɹən ɪmɪdʒ
$ echo "Im looking for an image." | phonemize -l en-gb
ɪm lʊkɪŋ fəɹən ɪmɪdʒ
mmmaat commented 6 years ago

Fixed in commit https://github.com/bootphon/phonemizer/commit/a2af4801aafb2c0032b955701865d168379462b1