Open iamanigeeit opened 7 months ago
So in general I would not recommend using G2P for common words. The grapheme sequence "my" is overwhelmingly going to be pronounced as "mʲ i" because of the sheer number of words that end in "my" like "alchemy", "anatomy", etc, and these words are weighted the same as the word "my". The use case for G2P is to generate pronunciations for low frequency words not the simple words you're using above, since those are all covered by a pronunciation dictionary and behave quite differently than longer and less frequent words. So you'd get better by using dictionary lookups with pronunciation dictionary and only using g2p as a fall back.
@mmcauliffe Thanks for the reply. I understand everything is geared towards aligning audio, not for other applications. Is there a way to do normal G2P in MFA or do I have to customize G2P to check the dictionary for known words?
This seems to be a natural extension for MFA, since the phoneme definitions are universal (?) and based on actual acoustics. IOf the multilingual G2P models available, MFA is definitely better than espeak-ng, while NeMo G2P requires downloading the entire framework. Not sure about epitran (will try next).
Hello,
I have been experimenting with the pretrained English (US) MFA model in Python.
Problems with simple words like
my
orhehe
:Sometimes this can be solved by increasing num_pronunciations:
The first one happens to be correct but this is a consequence of sorting lexicographically according to
https://github.com/MontrealCorpusTools/Montreal-Forced-Aligner/blob/45ef83b07bacd4c1cd256d1bce2aca658b1c9e45/montreal_forced_aligner/g2p/generator.py#L184
So it's wrong in other cases:
And the cross product makes sentences with $n$ words and $k$ possibilities per word take $O(k^n)$, surely not what we want when we only need the best match:
https://github.com/MontrealCorpusTools/Montreal-Forced-Aligner/blob/45ef83b07bacd4c1cd256d1bce2aca658b1c9e45/montreal_forced_aligner/g2p/generator.py#L390
Maybe the most straightforward method is to take
top_rewrite
for each word and just join them?