No delimiter in predictions from multi-letter phoneme codes

as-ideas / DeepPhonemizer

Grapheme to phoneme conversion with deep learning.

MIT License

361 stars 41 forks source link

Thank you for putting this out there. I'm trying to train the model myself on English CMU pronunciations, which have multi-letter phoneme codes. I structure my phoneme transcriptions as lists, for example:

('en_us', 'timbre', ['T','IH1','M','B','ER0'])

The model trains fine, but when I ask for transcriptions (via, say phonemise_list()), the model output doesn't put delimiters between the phonemes; so it's version of 'timbre' is:

'TAY1MBER0'

This is not helpful, and also not what the pre-trained CMU model does - It produces output like:

'[T][AY1][M][B][ER0]'

How can I adjust the config file or the calls to train() so that I get back something with delimiters between the phonemes?

as-ideas / DeepPhonemizer

No delimiter in predictions from multi-letter phoneme codes #26