Open eprzysinda opened 2 years ago
The text file looks like it's pretty long, how long of an audio file is it? Is it possible for you to chunk it into an input TextGrid like: https://montreal-forced-aligner.readthedocs.io/en/latest/user_guide/corpus_structure.html#textgrid-format
The logs you attached just had the one file processed in them (MFA-Office-E1-1-1-0-30-168708333333335, looks like it aligned successfully and is 30 seconds long), so I'm not sure if there are other logs or files in there. Also it looks like you're running an older version, so I'd recommend upgrading to the latest and re-running the alignment with --clean
as there's been a number of fixes and improvements made.
Certain files error in mfa validate: "utterances that need a larger beam to align"
[x ] Have you updated to latest MFA version? (version 2.0.0rc1) [ x] Have you tried rerunning the command with the
--clean
flag?Describe the issue Here are the steps I've took:
For Reproducing your issue Please fill out the following:
Log file Please attach the log file for the run that encountered an error (by default these will be stored in
~/Documents/MFA
): align.0.log compile_train_graphs.0.log*I also will attach an example text file used: clip_E1_3_3.txt
xmin = 0 xmax = 118.04266666666666 tiers?
size = 2
item []:
item [1]:
class = "IntervalTier"
name = "words"
xmin = 0
xmax = 118.04266666666666
intervals: size = 1
intervals [1]:
xmin = 0
xmax = 118.04266666666666
text = ""
item [2]:
class = "IntervalTier"
name = "phones"
xmin = 0
xmax = 118.04266666666666
intervals: size = 1
intervals [1]:
xmin = 0
xmax = 118.04266666666666
text = ""
Desktop (please complete the following information):
Additional context Command to align I use: mfa align [input path] [librarypath]/librispeech-lexicon.txt" [output path]
Command to validate: mfa validate [input path] [librarypath]/librispeech-lexicon.txt" [output path] --clean --beam=20
Output from validate: INFO - Setting up corpus information... INFO - Loading corpus from source files... 100%|██████████████████████████████████████████████████████████████████| 3/3 [00:02<00:00, 1.49it/s] INFO - Number of speakers in corpus: 1, average number of utterances per speaker: 3.0 INFO - Setting up training data... INFO - Generating base features (mfcc)... INFO - Generating MFCCs... 100%|██████████████████████████████████████████████████████████████████| 3/3 [00:07<00:00, 2.48s/it] INFO - Calculating CMVN... INFO - Skipping transcription testing INFO - Finished initializing!
Corpus
3 sound files 3 lab files 0 textgrid files 1 speakers 3 utterances 328.043 seconds total duration
Sound file read errors
Feature generation
Files without transcriptions
Transcriptions without sound files
Text file read errors
Dictionary
Out of vocabulary words
Acoustic model compatibility
Alignment
INFO - Compiling training graphs... 100%|██████████████████████████████████████████████████████████████████| 3/3 [00:05<00:00, 1.69s/it] INFO - Generating alignments... 33%|██████████████████████ | 1/3 [00:12<00:25, 12.60s/it] 0 utterances were too short to be aligned 2 utterances that need a larger beam to align There were 2 unaligned utterances out of 3 after initial training. For details, please see:
1 utterances were successfully aligned 0 utterances were too short to be aligned 2 utterances that need a larger beam to align There were 2 unaligned utterances out of 3 after initial training. For details, please see:
1 utterances were successfully aligned INFO - Done! Everything took 48.94140601158142 seconds