as-ideas / DeepPhonemizer

Grapheme to phoneme conversion with deep learning.
MIT License
361 stars 41 forks source link

No delimiter in predictions from multi-letter phoneme codes #26

Closed thelahunginjeet closed 2 years ago

thelahunginjeet commented 2 years ago

Thank you for putting this out there. I'm trying to train the model myself on English CMU pronunciations, which have multi-letter phoneme codes. I structure my phoneme transcriptions as lists, for example:

('en_us', 'timbre', ['T','IH1','M','B','ER0'])

The model trains fine, but when I ask for transcriptions (via, say phonemise_list()), the model output doesn't put delimiters between the phonemes; so it's version of 'timbre' is:

'TAY1MBER0'

This is not helpful, and also not what the pre-trained CMU model does - It produces output like:

'[T][AY1][M][B][ER0]'

How can I adjust the config file or the calls to train() so that I get back something with delimiters between the phonemes?

thelahunginjeet commented 2 years ago

I figured this out, in case anyone else is having the issue. The trick is to include delimiters in both your phoneme inventory and the transcriptions. So, in the config file, instead of:

phoneme_symbols: ['T', 'UW1', 'S', . . .]

you want

phoneme_symbols:['[T]', '[UW1]', '[S]', . . .]

And the training/test samples look like:

('en_us', 'timbre', ['[T]', '[IH1]', '[M]', '[B]', '[ER0]'])

If you do that, you'll get predictions that look like what the pre-trained CMU model produces.