Open WangHelin1997 opened 1 year ago
There is the warning
There were 463 pronunciations in the dictionary that were ignored for containing one of 32 phones not present
in thetrained acoustic model. Please run `mfa validate` to get more details.
You say you're using english_mfa dictionary, but in the command it's using english_us_arpa, so depending on what's in the "new_lexicon.txt" dictionary, there might not be anything to really output? It should output something, but not very useful like lots of OOVs, but I would get the mismatch in dictionary/acoustic model fixed first.
Hi! I have checked it and this is my second run output.
(mfa-dev) dean@inspur:/data/dean/whl-2022/Speech-Backbones/Montreal-Forced-Aligner$ mfa align -j 10 --clean /data/dean/whl-2022/Speech-Backbones/DiffVC/processed_data /home/dean/Documents/MFA/pretrained_models/dictionary/english_mfa.dict english_mfa /home/dean/Documents/MFA/aligned
INFO - Setting up corpus information...
INFO - Loading corpus from source files...
21379it [00:03, 6746.17it/s]
INFO - Found 28 speakers across 21379 files, average number of utterances per speaker: 763.5357142857143
INFO - Initializing multiprocessing jobs...
INFO - Creating corpus split for feature generation...
INFO - Generating base features (mfcc)...
INFO - Generating MFCCs...
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▊| 21350/21379 [00:48<00:00, 438.39it/s]
INFO - Calculating CMVN...
INFO - Creating corpus split with features...
INFO - Compiling training graphs...
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 21379/21379 [00:02<00:00, 10124.11it/s]
INFO - Performing first-pass alignment...
INFO - Generating alignments...
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 21379/21379 [00:08<00:00, 2400.72it/s]
INFO - Calculating fMLLR for speaker adaptation...
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 28/28 [00:01<00:00, 15.29it/s]
INFO - Performing second-pass alignment...
INFO - Generating alignments...
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 21379/21379 [00:08<00:00, 2386.32it/s]
INFO - Exporting TextGrids to /home/dean/Documents/MFA/aligned...
INFO - Collecting phone and word alignments from alignment lattices...
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 21379/21379 [00:03<00:00, 6936.33it/s]
0%| | 0/21379 [00:02<?, ?it/s]
INFO - Finished exporting TextGrids to /home/dean/Documents/MFA/aligned!
INFO - Done! Everything took 97.89735674858093 seconds
It seems that everything works well except the exporting part. And here is the log file.
2023-02-02 01:08:05,524 - processed_data_pretrained_aligner - DEBUG - Beginning run for pretrained_aligner on processed_data
2023-02-02 01:08:05,524 - processed_data_pretrained_aligner - DEBUG - Using multiprocessing with 10
2023-02-02 01:08:05,524 - processed_data_pretrained_aligner - DEBUG - Set up logger for MFA version: 2.0.6
2023-02-02 01:08:05,524 - processed_data_pretrained_aligner - DEBUG - Cleaned previous run
2023-02-02 01:08:05,525 - processed_data_pretrained_aligner - DEBUG - There were some differences in the current run compared to the last one. This may cause issues, run with --clean, if you hit an error.
2023-02-02 01:08:06,482 - processed_data_pretrained_aligner - DEBUG - Using IPA
2023-02-02 01:08:12,283 - processed_data_pretrained_aligner - DEBUG - Loaded dictionary in 5.805937767028809
2023-02-02 01:08:18,235 - processed_data_pretrained_aligner - DEBUG - Wrote lexicon information in 5.951487064361572
2023-02-02 01:08:18,255 - processed_data_pretrained_aligner - INFO - Setting up corpus information...
2023-02-02 01:08:18,255 - processed_data_pretrained_aligner - DEBUG - Could not load from temp
2023-02-02 01:08:18,256 - processed_data_pretrained_aligner - INFO - Loading corpus from source files...
2023-02-02 01:08:20,997 - processed_data_pretrained_aligner - DEBUG - Processing queue: 1.1797338810000007
2023-02-02 01:08:21,550 - processed_data_pretrained_aligner - DEBUG - Parsed corpus directory with 10 jobs in 1.7061748439999995 seconds
2023-02-02 01:08:21,587 - processed_data_pretrained_aligner - INFO - Found 28 speakers across 21379 files, average number of utterances per speaker: 763.5357142857143
2023-02-02 01:08:21,588 - processed_data_pretrained_aligner - DEBUG - Loaded corpus in 3.352215528488159
2023-02-02 01:08:21,588 - processed_data_pretrained_aligner - INFO - Initializing multiprocessing jobs...
2023-02-02 01:08:21,618 - processed_data_pretrained_aligner - DEBUG - Initialized jobs in 0.03055095672607422
2023-02-02 01:08:21,618 - processed_data_pretrained_aligner - INFO - Creating corpus split for feature generation...
2023-02-02 01:08:21,788 - processed_data_pretrained_aligner - DEBUG - Created corpus split directory in 0.1694188117980957
2023-02-02 01:08:21,788 - processed_data_pretrained_aligner - INFO - Generating base features (mfcc)...
2023-02-02 01:08:21,788 - processed_data_pretrained_aligner - INFO - Generating MFCCs...
2023-02-02 01:09:10,991 - processed_data_pretrained_aligner - INFO - Calculating CMVN...
2023-02-02 01:09:11,589 - processed_data_pretrained_aligner - INFO - Creating corpus split with features...
2023-02-02 01:09:11,832 - processed_data_pretrained_aligner - DEBUG - Generated features in 50.044026374816895
2023-02-02 01:09:11,851 - processed_data_pretrained_aligner - DEBUG - Calculated oovs found in 0.018543004989624023
2023-02-02 01:09:11,851 - processed_data_pretrained_aligner - DEBUG - Setting up corpus took 65.37393045425415 seconds
2023-02-02 01:09:11,851 - processed_data_pretrained_aligner - DEBUG -
2023-02-02 01:09:11,851 - processed_data_pretrained_aligner - DEBUG - ====ACOUSTIC MODEL INFO====
2023-02-02 01:09:11,851 - processed_data_pretrained_aligner - DEBUG - Acoustic model root directory: /home/dean/Documents/MFA/extracted_models/acoustic
2023-02-02 01:09:11,851 - processed_data_pretrained_aligner - DEBUG - Acoustic model dirname: /home/dean/Documents/MFA/extracted_models/acoustic/english_mfa_acoustic
2023-02-02 01:09:11,851 - processed_data_pretrained_aligner - DEBUG - Acoustic model meta path: /home/dean/Documents/MFA/extracted_models/acoustic/english_mfa_acoustic/meta.json
2023-02-02 01:09:11,851 - processed_data_pretrained_aligner - DEBUG - Acoustic model meta information:
2023-02-02 01:09:11,858 - processed_data_pretrained_aligner - DEBUG - architecture: gmm-hmm
features:
allow_downsample: true
allow_upsample: true
delta_pitch: 0.005
feature_type: mfcc
frame_length: 25
frame_shift: 10
high_frequency: 7800
low_frequency: 20
max_f0: 500
min_f0: 50
penalty_factor: 0.1
sample_frequency: 16000
snip_edges: true
use_energy: false
use_pitch: true
uses_cmvn: true
uses_deltas: false
uses_speaker_adaptation: true
uses_splices: true
uses_voiced: false
final_non_silence_correction: 0.19
final_silence_correction: 2.165
initial_silence_probability: 0.2
oov_phone: spn
optional_silence_phone: sil
phone_set_type: IPA
phone_type: triphone
phones: !!set
a: null
aj: null
aw: null
"a\u02D0": null
b: null
"b\u02B2": null
c: null
"c\u02B0": null
d: null
"d\u0292": null
"d\u02B2": null
"d\u032A": null
e: null
ej: null
f: null
"f\u02B2": null
h: null
i: null
"i\u02D0": null
j: null
k: null
"k\u02B0": null
l: null
m: null
"m\u02B2": null
"m\u0329": null
n: null
"n\u0329": null
o: null
ow: null
p: null
"p\u02B0": null
"p\u02B2": null
s: null
t: null
"t\u0283": null
"t\u02B0": null
"t\u02B2": null
"t\u032A": null
u: null
"u\u02D0": null
v: null
"v\u02B2": null
w: null
z: null
"\xE6": null
"\xE7": null
"\xF0": null
"\u014B": null
"\u0250": null
"\u0251": null
"\u0251\u02D0": null
"\u0252": null
"\u0252\u02D0": null
"\u0254": null
"\u0254j": null
"\u0259": null
"\u0259w": null
"\u025A": null
"\u025B": null
"\u025B\u02D0": null
"\u025C": null
"\u025C\u02D0": null
"\u025D": null
"\u025F": null
"\u0261": null
"\u026A": null
"\u026B": null
"\u026B\u0329": null
"\u0271": null
"\u0272": null
"\u0279": null
"\u027E": null
"\u027E\u02B2": null
"\u0283": null
"\u0289": null
"\u0289\u02D0": null
"\u028A": null
"\u028E": null
"\u0292": null
"\u0294": null
"\u03B8": null
silence_probability: 0.19976790028347458
train_date: '2022-05-14 11:48:51.797073'
training:
audio_duration: 13146329.456449727
average_log_likelihood: -0.01580790029624346
num_oovs: 0
num_speakers: 77861
num_utterances: 2411758
version: 2.0.0rc4.dev19+ged818cb.d20220404
2023-02-02 01:09:11,858 - processed_data_pretrained_aligner - DEBUG -
2023-02-02 01:09:11,858 - processed_data_pretrained_aligner - DEBUG - Setup for alignment in 66.33342623710632 seconds
2023-02-02 01:09:11,858 - processed_data_pretrained_aligner - INFO - Compiling training graphs...
2023-02-02 01:09:13,972 - processed_data_pretrained_aligner - DEBUG - Compiling training graphs took 2.1131227016448975
2023-02-02 01:09:13,974 - processed_data_pretrained_aligner - INFO - Performing first-pass alignment...
2023-02-02 01:09:13,974 - processed_data_pretrained_aligner - INFO - Generating alignments...
2023-02-02 01:09:22,880 - processed_data_pretrained_aligner - DEBUG - Alignment round took 8.905574083328247
2023-02-02 01:09:25,137 - processed_data_pretrained_aligner - DEBUG - Average per frame likelihood for alignment: -50.45720663988212
2023-02-02 01:09:25,137 - processed_data_pretrained_aligner - DEBUG - Compiling information took 2.249269723892212
2023-02-02 01:09:25,137 - processed_data_pretrained_aligner - INFO - Calculating fMLLR for speaker adaptation...
2023-02-02 01:09:26,971 - processed_data_pretrained_aligner - DEBUG - Fmllr calculation took 1.8337764739990234
2023-02-02 01:09:26,973 - processed_data_pretrained_aligner - INFO - Performing second-pass alignment...
2023-02-02 01:09:26,974 - processed_data_pretrained_aligner - INFO - Generating alignments...
2023-02-02 01:09:35,933 - processed_data_pretrained_aligner - DEBUG - Alignment round took 8.95933198928833
2023-02-02 01:09:38,203 - processed_data_pretrained_aligner - DEBUG - Average per frame likelihood for alignment: -50.68226060948395
2023-02-02 01:09:38,203 - processed_data_pretrained_aligner - DEBUG - Compiling information took 2.264148712158203
2023-02-02 01:09:38,203 - processed_data_pretrained_aligner - INFO - Exporting TextGrids to /home/dean/Documents/MFA/aligned...
2023-02-02 01:09:38,204 - processed_data_pretrained_aligner - INFO - Collecting phone and word alignments from alignment lattices...
2023-02-02 01:09:43,418 - processed_data_pretrained_aligner - INFO - Finished exporting TextGrids to /home/dean/Documents/MFA/aligned!
2023-02-02 01:09:43,419 - processed_data_pretrained_aligner - DEBUG - Exported TextGrids in a total of 2.1298766136169434 seconds
2023-02-02 01:09:43,421 - processed_data_pretrained_aligner - INFO - Done! Everything took 97.89735674858093 seconds
Is there anything in "export_textgrids..log" files in ~/Documents/MFA/processed_data/alignment/log? I think* that would the directory structure, but I've been working with a new version that does things slightly differently so I'm a bit fuzzy on the exact location of the export textgrid logs.
Hi. I didn't find the folder of ~/Documents/MFA/processed_data/alignment/
Oh under pretrained_aligner/log then, sorry!
They are all
Done!
Well that logging is less than helpful. I'm very puzzled. And they didn't get stuck in pretrained_aligner/textgrids to avoid overwriting an existing folder?
I'm hoping to get 2.1 out tonight, which is outputting jsons correctly for me, so I'll let you know when that's live and you can try it out? (and hopefully it doesn't introduce other issues for you).
Thanks so much. I will try the new version.
请问您解决了吗?是因为录音时长太长了嘛?
Debugging checklist
[ ] Have you updated to latest MFA version? Yes [ ] Have you tried rerunning the command with the
--clean
flag? YesDescribe the issue A clear and concise description of what the bug is.
For Reproducing your issue Please fill out the following:
Log file Please attach the log file for the run that encountered an error (by default these will be stored in
~/Documents/MFA
).Desktop (please complete the following information):
Additional context Add any other context about the problem here.