bootphon / phonemizer

Simple text to phones converter for multiple languages
https://bootphon.github.io/phonemizer/
GNU General Public License v3.0
1.18k stars 165 forks source link

How to get token separated phones for a given sentence in python? #84

Closed ambiSk closed 2 years ago

ambiSk commented 2 years ago

My Code looks like this

from phonemizer import phonemize

phonemize(
    text = 'हमने उसका जन्मदिन मनाया',
    language = 'hi',
    backend = 'espeak',
)

And the output is kind of like this:

Out: 'hʌmneː ʊskaː ɟʌnmədɪn mənaːjaː '

I came across a class known to be Separator, but I don't know how to initialise that in this method to get token separated output.

Example: if token separated token is -, then the output for the above sentence I'm expecting needs to be like this:

Out: 'h-ʌ-m-n-eː- -ʊ-s-k-aː- -ɟ-ʌ-n-m-ə-d-ɪn- -m-ə-n-aː-j-aː- -'

We can ignore the space separated characters(- -)

mmmaat commented 2 years ago

Hi, You can have exactly what you want. Have a look:

from phonemizer import phonemize
from phonemizer.separator import Separator

phonemize(
    text = 'हमने उसका जन्मदिन मनाया',
    language = 'hi',
    backend = 'espeak',
    separator=Separator(phone='-', word=' -'),
)

You'll get h-ʌ-m-n-eː- -ʊ-s-k-aː- -ɟ-ʌ-n-m-ə-d-ɪ-n- -m-ə-n-aː-j-aː- -.

ambiSk commented 2 years ago

Thanks