bootphon / phonemizer

Simple text to phones converter for multiple languages
https://bootphon.github.io/phonemizer/
GNU General Public License v3.0
1.15k stars 163 forks source link

Word level mapping of phonemized output #96

Open mmmaat opened 2 years ago

mmmaat commented 2 years ago

Suugested by @CorentinJ, see his implementation here.

At some point we got interested in being able to map from characters in the input text of our TTS system to its audio output. That required being able to map from an orthographic input to its phonemized output. Since your library does not provide such mappings, and since espeak doesn't seem to either, I wrote an algorithm to figure them out. It operates at the word level, and we verified that it is correct even for complex edge cases.

For instance an exemple with edge cases (on the -> "ɔnðɪ", Youtubers -> "juː ɾuːbɚz") |Youtubers| |no| |longer| |belong| |on the| |internet| becomes |juː ɾuːbɚz| |noʊ| |lɑːŋɡɚ| |bᵻlɔŋ| |ɔnðɪ| |ɪntɚnɛt|.

It must still be decided how to implement that:

This feature seems to be incompatible with phone/syllable separators. What about punctuation preservation?

trenslow commented 1 year ago

First off, thanks for the awesome tool. It has made my life so much easier in a lot of respects.

Has there been any progress made on this topic? It's something that could be really useful.

One simple solution I tried was to use the same word separator that I sent to .phonemize to split the original text. However, sometimes eSpeak-NG merges words, so the number of 'words' I get back from .phonemize doesn't align with the number I get back from the original text split (e.g. the "That's it, words are merged." example from the documentation).

I don't think the merging is configurable on the eSpeak side, as the merging comes from the underlying pronunciation dictionaries. So what remains is sending the split words one-by-one. This also isn't perfect, as you lose information about e.g. sandhi effects.

Maybe there's some information flowing from eSpeak about which words are merged which could be provided to phonemizer users? That would allow people to at least have the choice of what to do about that information.

mmmaat commented 1 year ago

No progress by now... Did you try the code by @CorentinJ here?

trenslow commented 1 year ago

i didn't, as my use case is for languages other than english

CorentinJ commented 1 year ago

Word-level mappings should work for all languages with the algo I provided.

trenslow commented 1 year ago

Ah ok! I will explore it ASAP.

mmmaat commented 1 year ago

If someone want to do a PR with that, it will be great, I have no time for this project in the next few months...