Closed ahkarami closed 2 years ago
As far as I'm aware we have not formally tried predicting phonemes and comparing those models' accuracy against grapheme-predicting models for any languages. If anyone on the team knows better please correct me.
I have trained a few QuartzNet models on mostly unambiguous phonemes from CMUdict (where heteronyms/homographs were stripped out or disambiguated), but those numbers wouldn't be accurate for general use.
I'd hazard a guess that, if your G2P model was accurate, a phoneme prediction model would perform a little better than a grapheme (raw text) prediction model due to the lack of ambiguity, at least for English. But then you'd have to translate back from phonemes to words (assuming you're not just using the phonemes for something else) and figure out how you want to deal with accents/regional pronunciations, which is another story.
Thanks for your great explanation. Best
Hi, Thank you for your great repo. I have 2 questions: 1- Is it better to use G2P for English ASR (or just use the raw text for it)? if yes, what model (G2P) do you suggest for this work? 2- Is it better to use G2P for ASR of other languages or not? (for example Arabic ASR) Best