MontrealCorpusTools / Montreal-Forced-Aligner

Command line utility for forced alignment using Kaldi
https://montrealcorpustools.github.io/Montreal-Forced-Aligner/
MIT License
1.29k stars 242 forks source link

missing some phones in the generated TextGrid files [BUG] #474

Open zaynabmu opened 2 years ago

zaynabmu commented 2 years ago

Debugging checklist

[ ] Have you updated to latest MFA version? yes 2.0.1 [ ] Have you tried rerunning the command with the --clean flag? yes

*Describe the issue

Actually I trained a new acoustic model on my language (Arabic) correctly as documentation step by step , but the generated TectGrid output contains (spn) and some phones were missing since I want to use TextGrids files in training Fastspeech model but the generated voices worse and then I figured out the reason from missing phones in TG files could you pls help me. image

For Reproducing your issue Please fill out the following:

  1. Corpus structure
    • What language is the corpus in? Arabic
    • How many files/speakers? 23/23
    • Are you using lab files or TextGrid files for input? lab files
  2. Dictionary
    • Are you using a dictionary from MFA? If so, which one? no
    • If it's a custom dictionary, what is the phoneset? Ara
  3. Acoustic model
    • If you're using an acoustic model, is it one download through MFA? If so, which one? no
    • If it's a model you've trained, what data was it trained on? audio files with wav format

Log file Please attach the log file for the run that encountered an error (by default these will be stored in ~/Documents/MFA).

Desktop (please complete the following information):

Additional context Add any other context about the problem here. when I run mfa validate everythings seems okay

mmcauliffe commented 2 years ago

The "spn" are generated from bracketed words or words not in the dictionary. In the root temporary directory (~/Documents/MFA/{corpus_name}), there will be an "oovs_found.txt". If you add those words to your dictionary with pronunciations, then they should show up correctly in the aligned TextGrid.

peanut1101 commented 2 years ago

@mmcauliffe @zaynabmu I have encountered the same problem. Have you solved it? Please send me an email chen347444625@163.com

zaynabmu commented 2 years ago

@mmcauliffe @zaynabmu I have encountered the same problem. Have you solved it? Please send me an email chen347444625@163.com

Yes its solved , follow the same instructions that mmcauliffe said and it will be work with you

peanut1101 commented 2 years ago

@zaynabmu Thanks

Lakhjeet1082 commented 4 months ago

The "spn" are generated from bracketed words or words not in the dictionary. In the root temporary directory (~/Documents/MFA/{corpus_name}), there will be an "oovs_found.txt". If you add those words to your dictionary with pronunciations, then they should show up correctly in the aligned TextGrid.

i have added all oov words with their pronounciation to my dictionary and trained the acoustic model again for both the speakers, but still i am getting "spn" in the newly generated textgrid files. Please guide me on this

chirila commented 2 months ago

this also happens (I think) if the words contain segments that aren't in the phone set