Closed ibleaman closed 1 year ago
What version of MFA are you using and can you try it on the latest one? I'm not seeing this show up when using the xsampa test data:
https://github.com/MontrealCorpusTools/Montreal-Forced-Aligner/blob/main/tests/data/lab/xsampa.lab https://github.com/MontrealCorpusTools/Montreal-Forced-Aligner/blob/main/tests/data/dictionaries/xsampa.txt
@mmcauliffe My apologies for the delay! I'm returning to this now and still have the same issue.
I am using version 2.0.6, installed in the standard conda way (on Google Colab).
I've run both of these commands:
mfa validate --config_path yid_config.yaml yid_corp yid_corp/yiddish_lexicon.txt
mfa train --config_path yid_config.yaml --include_original_text yid_corp yid_corp/yiddish_lexicon.txt alignments
_yidconfig.yaml contains this line:
ignore_case: false
(I wasn't sure from the documentation whether to capitalize false but assumed from this file that I shouldn't.)
The directory _yidcorp consists of matched .TextGrid and .wav files, 2 tiers each, 1 per speaker, as well as the lexicon file. The lexicon file has words with both lowercase and uppercase characters, mapped onto their phones. The capitalization is important because I have many minimal pairs.
The resulting aligned .TextGrid files (including the original utterance text tier!) are entirely in lowercase. Both _oov_counts_yiddishlexicon.txt and _oovs_found_yiddishlexicon.txt are also all lowercase. Interestingly, words.txt (inside _/Documents/MFA/yid_corp_train_acoustic_model/dictionary/1_yiddishlexicon/) shows the correct capitalization.
Am I using the configuration file incorrectly? Please let me know what you advise. Thank you!
@mmcauliffe Do you have any updates on this issue? Thanks!
I'm running
mfa validate
andmfa train
on a small corpus with a dictionary file. My dictionary entries are case-sensitive, e.g., a word like Main would be defined with a different pronunciation than main. I see that one can specifyignore_case
as a parameter -- but how exactly is that accomplished?For context, I created a file named
config.yaml
containing one line:and then ran
mfa validate --config_path config.yaml corp lex.txt
, but based on the OOV list, all words are still being converted to lowercase.Thanks!