bootphon / phonemizer

Simple text to phones converter for multiple languages
https://bootphon.github.io/phonemizer/
GNU General Public License v3.0
1.21k stars 168 forks source link

Disparity between backends with punctuation #137

Open agkphysics opened 2 years ago

agkphysics commented 2 years ago

Describe the bug When using the default preserve_punctuation=False, the Festival backend ignores text that only contains punctuation, whereas the Espeak backend returns the empty string.

Phonemizer version

phonemizer-3.2.1
available backends: espeak-ng-1.50, espeak-mbrola, festival-2.5.0, segments-2.2.1

System Ubuntu 20.04.4 Linux kernel 5.15.0 Python 3.8.10

To reproduce

from phonemizer import phonemize

print(phonemize([".", "."], language="en-us", backend="festival"))
print(phonemize([".", "."], language="en-us", backend="espeak"))
print(phonemize([".", "."], language="mb-us1", backend="espeak-mbrola"))

Yields output

[]
['', '']
['', '']

Expected behavior Should output:

['', '']
['', '']
['', '']
mmmaat commented 1 year ago

Hi, actually with preserve_punctuation=True another bug occurs:

from phonemizer import phonemize 

print(phonemize([".", "."], language="en-us", backend="festival", preserve_punctuation=True)) 
print(phonemize([".", "."], language="en-us", backend="espeak", preserve_punctuation=True)) 
print(phonemize([".", "."], language="mb-us1", backend="espeak-mbrola", preserve_punctuation=True))   

Yields

['..']
['..']
['', '']

But should be (espeak-mbrola does not support punctuation)

['.', '.']
['.', '.']
['', '']