bootphon / phonemizer

Simple text to phones converter for multiple languages
https://bootphon.github.io/phonemizer/
GNU General Public License v3.0
1.19k stars 166 forks source link

Edge case where multiline outputs get merged with preserve_punctuations and custom punctuation marks #55

Closed CorentinJ closed 3 years ago

CorentinJ commented 3 years ago

The following

from phonemizer import phonemize

phonemize(['"Hey! "', '"hey,"'], backend="espeak", preserve_punctuation=True, punctuation_marks='.!;:,?')

Gives for output [' heɪ ! heɪ ', ','], (the second sentence got merged with the first) when [' heɪ ! ', 'heɪ,'] was expected.

mmmaat commented 3 years ago

Ok thank's for reporting, actually this is even worst...

>>> from phonemizer import phonemize
>>> phonemize(['! ?', 'hey!'], backend="espeak", preserve_punctuation=True, punctuation_marks='!') 
['! heɪ ', '!']
>>> phonemize(['?', 'hey!'], backend="espeak", preserve_punctuation=True, punctuation_marks='!')
['heɪ ', '!']

I'll look at a fix.