Dictionary format is unclear.

MontrealCorpusTools / mfa-models

Collection of pretrained models for the Montreal Forced Aligner

Creative Commons Attribution 4.0 International

103 stars 19 forks source link

Hello and thank you for your work.

I was working with russian_mfa.dict (downloaded via mfa model download dictionary russian_mfa) and its format seems unclear: typically ноутбуков 1 0.0 0.0 0.0 n̪ o ʊ d̪ b u k ə f, I understand that it is a word and its phonemes at the very beginning of the line and at the end respectively, and the first number is some probability, but I can't figure it out what does the other 3 numbers mean. I looked at the documentation here, but there is nothing about format :(

It's important because the output of mfa g2p russian_mfa oov.txt oov_phonemes.txt has the following format жбанков ('ʐ', 'b', 'a', 'n̪', 'k', 'ə', 'f') and it's unclear how to merge existing dictionary with oov words, because the formats are different.

Could you please explain what the format is russian_mfa.dict or where to read about it. Best wishes

MontrealCorpusTools / mfa-models

Dictionary format is unclear. #8