Closed hypnaceae closed 2 years ago
Can you rerun with --debug
and see exactly where it's hitting that error?
I'm running with --debug
already. Very strange that it has no effect, is it positional?
Oh sorry, my bad, I meant --verbose
that'll print the full stack trace
Thanks. Seems like --verbose
also has no effect. Console output is exactly the same.
Can you try running mfa g2p .\Desktop\oovs.txt .\Desktop\russian_mfa.zip .\Desktop\oovs_lex.txt --debug --clean
?
The positional arguments can't be specified with the --option
style flags, so I think that's what's causing this?
Sure, here's the result:
(base) PS C:\Users\admin\Desktop> mfa g2p .\russian_mfa.zip .\oovs.txt .\oovs_lex.txt --debug --clean
Generating pronunciations from G2P model
WARNING! The following graphemes were not found in the specified G2P model: - a b c d e g h i k l m n o p r s t u v x z а б в г д е ж з и й к л м н о п р с т у ф х ц ч ш щ ъ ы ь э ю я ё
montreal_forced_aligner.exceptions.G2PError: Previously trained Phonetisaurus models from 1.1 and earlier are not currently supported. Please retrain your model using 2.0+
That's with this model: https://github.com/MontrealCorpusTools/mfa-models/releases/tag/g2p-russian_mfa-v2.0.0a
After that, I ran mfa download g2p russian_g2p
and mfa g2p russian_g2p .\oovs.txt .\oovs_lex.txt
and it actually started generating pronunciations, though they look a bit weird, for example: яшеньку jA S jE nj k u
. Not sure what phoneset that is, but I need the output phoneset to be the same as that in the latest dict.
Another thing I noticed was that despite installing version 2.0.6, mfa version
returns 2.0.0a21. The package filename in miniconda3/pkgs/
has 2.0.6 and it's the only version of MFA installed on my machine so I'm not sure what's going on there.
Thanks for the support so far :)
Can you try rerunning with - - clean? looks like it's still using the 1.0 phone set.
On Wed., Oct. 5, 2022, 4:57 a.m. hypnaceae, @.***> wrote:
Sure, here's the result:
Generating pronunciations from G2P model
WARNING! The following graphemes were not found in the specified G2P model: - a b c d e g h i k l m n o p r s t u v x z а б в г д е ж з и й к л м н о п р с т у ф х ц ч ш щ ъ ы ь э ю я ё
montreal_forced_aligner.exceptions.G2PError: Previously trained Phonetisaurus models from 1.1 and earlier are not currently supported. Please retrain your model using 2.0+```
That's with this model: https://github.com/MontrealCorpusTools/mfa-models/releases/tag/g2p-russian_mfa-v2.0.0a
After that, I ran
mfa download g2p russian_g2p
andmfa g2p russian_g2p .\oovs.txt .\oovs_lex.txt
and it actually started generating pronunciations, though they look a bit weird, for example:яшеньку jA S jE nj k u
. Not sure what phoneset that is, but I need the output phoneset to be the same as that in the latest dict.— Reply to this email directly, view it on GitHub https://github.com/MontrealCorpusTools/Montreal-Forced-Aligner/issues/508#issuecomment-1268339270, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAVJOT5AG3AG5ES7BEBLMR3WBVUJ7ANCNFSM6AAAAAAQ4B62FM . You are receiving this because you were assigned.Message ID: @.*** com>
No difference in output :/
Hmm, ok so it's working fine on my local machine, can you maybe delete the Documents/MFA folder, redownload the russian_mfa g2p model and re-run? I feel like there's some sticky files somewhere with the original 1.0 model.
Also weird about the version, did you maybe install it from pip at some point in addition to conda? What does which mfa
(Unix) or where mfa
(Windows) return?
Nope, same exact output. :( Mind you this machine has never seen pre-2.0.0 MFA. I had some version (possibly 2.0.1) installed earlier but did a reinstall of miniconda recently which included uninstalling old packages. In any case, I got what I needed by running on a remote linux machine... And if it works natively in Windows for you then it's probably user error on my part or some old files hidden somewhere on my machine. I'll close the issue then, sincere thanks again.
I'm working with a fresh conda and MFA installation. I've generated a list of OOVs in my corpus with mfa validate, now trying to run G2P on the output .txt file to supplement the dictionary. Language is Russian, though I had the same error trying to run G2P on Czech. I'm using the 2.0.0a Russian G2P model for this case.
Here's the error:
I'm not sure what it's failing to read, as there's no traceback. The input oovs text file is taken directly from the output of mfa validate, and is encoded in utf-8. See below: oovs_found_russian_mfa.txt
Are there any workarounds I could try?