bootphon / phonemizer

Simple text to phones converter for multiple languages
https://bootphon.github.io/phonemizer/
GNU General Public License v3.0
1.18k stars 165 forks source link

Request: an option to keep empty lines #95

Closed jncasey closed 2 years ago

jncasey commented 2 years ago

Is your feature request related to a problem? Please describe. I'd like to use this library on a project related to poetry and song lyrics, where empty lines as separators are an important part of the data.

Describe the solution you'd like It'd be great to add a flag to the phonemize method called something like keep_empty_lines, that would default to False to preserve current behavior, but could be enabled to get my desired behavior. I'm not sure if it's as simple as just adding a conditional around this line, or if passing empty lines to any of the backends could lead to unexpected/bad behavior.

Additional context I'm using the festival backend, if that makes a difference (to take advantage of its syllable separators)

mmmaat commented 2 years ago

Hi, indeed this is possible (an easy) to implement that option. It would be preserve_empty_lines=False argument in phonemize() and --preserve-empty-lines from command-line.

I do not have time to this currently but, if you want to submit a pull request with your modifications, please do it :).

jncasey commented 2 years ago

Sure, I can give it a shot in the next week or so.

I'm not familiar with the backends. Will they return an empty line if passed an empty line, or will it be necessary to strip out the empty lines from the input and reinsert them post-phonemizing?

mmmaat commented 2 years ago

Ok great!

All the backends keep empty lines as empty (see https://github.com/bootphon/phonemizer/blob/master/CHANGELOG.md#phonemizer-30, if not this is a bug). So maybe your work will just be to add a if somewhere (and to code the option and possibly few tests to make sure it is working for all the backends...)

jncasey commented 2 years ago

Quick update: I thought I had a nice simple solve for this, but I was working on my laptop that didn't have access to the festival backend. It turns out that festival does not like empty lines. That led me to make a couple tweaks the festival backend code, but then there were still problems when preserving punctuation.

I think the less disruptive solution is going to be extracting and reinserting the blank lines in the top level _phonemize method, so I'm going to scrap my current code and switch to that strategy instead.

jncasey commented 2 years ago

Closed by #103