bootphon / phonemizer

Simple text to phones converter for multiple languages
https://bootphon.github.io/phonemizer/
GNU General Public License v3.0
1.19k stars 166 forks source link

Mapping phonemes back to words #65

Closed tdeboissiere closed 3 years ago

tdeboissiere commented 3 years ago

Hi,

is there a straightforward way to map each phoneme back to the word to which it belongs ?

Example:

Input text: with the cat
Output IPA: wɪððə kˈæt

then

wɪð belong to `with`
ðə belong to `the`
kˈæt belong to `cat`
mmmaat commented 3 years ago

Hi, not at all. This is a destructive operation as you have homophones (different words but same pronunciation)...

tdeboissiere commented 3 years ago

Sorry, my statement was not clear.

The problem assumes that you have both the input text and the IPA transcription. In that case it should not be impacted by homophones.

For instance:

input text: sell cell
output IPA: sˈɛl sˈɛl

-> we know the first sˈɛl corresponds to `sell`, and the second to `cell`
mmmaat commented 3 years ago

oh OK! So you need a function taking (text, phonemized_text, separator) as input and outputing a dict word: phonemized_word.

Well in this case this is surely possible but I never implemented it. I suspect some tricky corner cases for instance with numbers or if espeak "eat" a word or pronounce it in another language, etc...

I do not plan to implement it myself but if you do, you're welcome to do a PR or share it here ;)

tdeboissiere commented 3 years ago

Thanks, will let you know if I ever implement it !

shreeshailgan commented 1 month ago

@tdeboissiere were you able to implement this method? how did you deal with espeak merging short words like "for the" when phonemizing?