MontrealCorpusTools / Montreal-Forced-Aligner

Command line utility for forced alignment using Kaldi
https://montrealcorpustools.github.io/Montreal-Forced-Aligner/
MIT License
1.35k stars 248 forks source link

phones in the dictionary that do not have acoustic models while using pre-trained model and given lexicon #347

Open yanirmr opened 3 years ago

yanirmr commented 3 years ago

I try to follow the example in the documentation of MFA :

I execute on my computer (windows 10, Python 3.9, pip 21.2.4):

pip install montreal-forced-aligner
mfa download acoustic english

Then, when I execute:

mfa align path/to/dataset path/to/lexicon.txt english path/to/output

I receive the next error message:

All required kaldi binaries were found!
Cleaning old directory!
INFO - Setting up corpus information...
INFO - Found old run with 1 rather than the current 3, setting to 1.  If you would like to use 3, re-run the command with --clean.
INFO - Number of speakers in corpus: 1, average number of utterances per speaker: 1.0
INFO - Parsing dictionary without pronunciation probabilities without silence probabilities
dictionary phones: {'AH2', 'K', 'AY0', 'IH2', 'OY0', 'EY1', 'AH1', 'AO1', 'UH0', 'OY2', 'EH0', 'T', 'AE2', 'F', 'EY0', 'W', 'IY1', 'AE0', 'P', 'ER0', 'D', 'IY0', 'V', 'OW2', 'NG', 'ER1', 'IH1', 'AE1', 'G', 'IY2', 'EH2', 'OW0', 'TH',
 'UW2', 'UW1', 'AH0', 'ZH', 'JH', 'Y', 'UH2', 'OY1', 'Z', 'B', 'EY2', 'AO0', 'IH0', 'OW1', 'UW0', 'ER2', 'N', 'UH1', 'AY2', 'CH', 'AY1', 'HH', 'AA2', 'AW2', 'EH1', 'M', 'S', 'DH', 'AO2', 'L', 'AW0', 'AA1', 'AA0', 'AW1', 'SH', 'R'}
model phones: set()
There were phones in the dictionary that do not have acoustic models: AA0, AA1, AA2, AE0, AE1, AE2, AH0, AH1, AH2, AO0, AO1, AO2, AW0, AW1, AW2, AY0, AY1, AY2, B, CH, D, DH, EH0, EH1, EH2, ER0, ER1, ER2, EY0, EY1, EY2, F, G, HH, IH0
, IH1, IH2, IY0, IY1, IY2, JH, K, L, M, N, NG, OW0, OW1, OW2, OY0, OY1, OY2, P, R, S, SH, T, TH, UH0, UH1, UH2, UW0, UW1, UW2, V, W, Y, Z, ZH

I already tried to run this line with a clean flag and uninstall and re-install the package, but nothing solved the problem. What should I do in that case?

Thanks!

(BTW, I ask about that also in SO )

chaksam commented 3 years ago

When I am trying following commands with latest version of Montreal Forced Aligner, it's giving following error:

mfa thirdparty download mfa download acoustic english

mfa thirdparty download usage: mfa [-h] {version,align,adapt,train,validate,g2p,train_g2p,model,train_lm,train_dictionary,train_ivector,classify_speakers,create_segments,transcribe,configure,history,annotator,anchor} ... mfa: error: argument subcommand: invalid choice: 'thirdparty' (choose from 'version', 'align', 'adapt', 'train', 'validate', 'g2p', 'train_g2p', 'model', 'train_lm', 'train_dictionary', 'train_ivector', 'classify_speakers', 'create_segments', 'transcribe', 'configure', 'history', 'annotator', 'anchor')


This resolved the issue for me: mfa model download english acoustic followed by: mfa align

mmcauliffe commented 3 years ago

Ah yes, sorry about that, MFA's currently in the middle of a transition for installation and new version release, the latest version commands for model download are indeed mfa model download acoustic english, you see more information about the model subcommand here: https://montreal-forced-aligner.readthedocs.io/en/latest/pretrained_models.html

@yanirmr could you try installing the latest version of MFA via conda install -c conda-forge montreal-forced-aligner and running mfa model download acoustic english to see if that resolves the issue?

chaksam commented 3 years ago

I am trying the mfa align on LJSpeech dataset using the pretrained model in English. But it's stuck at "Exporting TextGrids from CTMs..." stage for more than 16 hours. In the "alignments" folder I only got 756 *.TextGrid files (which got generated quickly). Total audio files are 13100. I then did a keyboard interrupt and started again, it again is stuck at same place. The command and details are shown below. Whats the solution for this.


mfa align --clean /home/kwantics/NeMo/ljspeech /home/NeMo/ljspeech/uncommented_cmudict.dict english /home/NeMo/ljspeech/alignments INFO - Setting up corpus information... INFO - Number of speakers in corpus: 1, average number of utterances per speaker: 13100.0 INFO - Parsing dictionary "uncommented_cmudict" without pronunciation probabilities without silence probabilities INFO - Creating dictionary information... INFO - Setting up training data... INFO - Generating base features (mfcc)... INFO - Calculating CMVN... INFO - Setting up training data... INFO - Setting up training data... INFO - Done with setup! INFO - Performing first-pass alignment... INFO - Calculating fMLLR for speaker adaptation... INFO - Performing second-pass alignment... INFO - Generating CTMs from alignment... INFO - Finished generating CTMs! INFO - Exporting TextGrids from CTMs...

"Stuck here for 16 hours"

^CProcess CombineProcessWorker-25: Traceback (most recent call last): File "/home/kwantics/anaconda3/envs/aligner/lib/python3.8/multiprocessing/process.py", line 318, in _bootstrap util._exit_function() File "/home/kwantics/anaconda3/envs/aligner/lib/python3.8/multiprocessing/util.py", line 360, in _exit_function _run_finalizers() File "/home/kwantics/anaconda3/envs/aligner/lib/python3.8/multiprocessing/util.py", line 300, in _run_finalizers finalizer() File "/home/kwantics/anaconda3/envs/aligner/lib/python3.8/multiprocessing/util.py", line 224, in call res = self._callback(*self._args, **self._kwargs) File "/home/kwantics/anaconda3/envs/aligner/lib/python3.8/multiprocessing/queues.py", line 195, in _finalize_join thread.join() File "/home/kwantics/anaconda3/envs/aligner/lib/python3.8/threading.py", line 1011, in join self._wait_for_tstate_lock() File "/home/kwantics/anaconda3/envs/aligner/lib/python3.8/threading.py", line 1027, in _wait_for_tstate_lock elif lock.acquire(block, timeout): KeyboardInterrupt Traceback (most recent call last): File "/home/kwantics/anaconda3/envs/aligner/bin/mfa", line 8, in Process ExportTextGridProcessWorker-27: sys.exit(main()) File "/home/kwantics/anaconda3/envs/aligner/lib/python3.8/site-packages/montreal_forced_aligner/command_line/mfa.py", line 747, in main run_align_corpus(args, unknown) File "/home/kwantics/anaconda3/envs/aligner/lib/python3.8/site-packages/montreal_forced_aligner/command_line/align.py", line 252, in run_align_corpus align_corpus(args, unknown_args) File "/home/kwantics/anaconda3/envs/aligner/lib/python3.8/site-packages/montreal_forced_aligner/command_line/align.py", line 191, in align_corpus a.export_textgrids(args.output_directory) File "/home/kwantics/anaconda3/envs/aligner/lib/python3.8/site-packages/montreal_forced_aligner/aligner/base.py", line 222, in export_textgrids convert_ali_to_textgrids(self) File "/home/kwantics/anaconda3/envs/aligner/lib/python3.8/site-packages/montreal_forced_aligner/multiprocessing/alignment.py", line 1703, in convert_ali_to_textgrids ctms_to_textgrids_mp(aligner) File "/home/kwantics/anaconda3/envs/aligner/lib/python3.8/site-packages/montreal_forced_aligner/multiprocessing/alignment.py", line 1634, in ctms_to_textgrids_mp combine_procs[i].join() File "/home/kwantics/anaconda3/envs/aligner/lib/python3.8/multiprocessing/process.py", line 149, in join res = self._popen.wait(timeout) File "/home/kwantics/anaconda3/envs/aligner/lib/python3.8/multiprocessing/popen_fork.py", line 47, in wait return self.poll(os.WNOHANG if timeout == 0.0 else 0) File "/home/kwantics/anaconda3/envs/aligner/lib/python3.8/multiprocessing/popen_fork.py", line 27, in poll Traceback (most recent call last): pid, sts = os.waitpid(self.pid, flag) File "/home/kwantics/anaconda3/envs/aligner/lib/python3.8/multiprocessing/process.py", line 315, in _bootstrap self.run() File "/home/kwantics/anaconda3/envs/aligner/lib/python3.8/site-packages/montreal_forced_aligner/multiprocessing/alignment.py", line 1444, in run file_name, data = self.for_write_queue.get(timeout=queue_polling_timeout) KeyboardInterrupt File "/home/kwantics/anaconda3/envs/aligner/lib/python3.8/multiprocessing/queues.py", line 111, in get res = self._recv_bytes() File "/home/kwantics/anaconda3/envs/aligner/lib/python3.8/multiprocessing/connection.py", line 216, in recv_bytes buf = self._recv_bytes(maxlength) File "/home/kwantics/anaconda3/envs/aligner/lib/python3.8/multiprocessing/connection.py", line 421, in _recv_bytes return self._recv(size) File "/home/kwantics/anaconda3/envs/aligner/lib/python3.8/multiprocessing/connection.py", line 379, in _recv chunk = read(handle, remaining) KeyboardInterrupt

mmcauliffe commented 3 years ago

I'll look into it, in the meantime you can rerun it with the --disable_mp flag, it looks like something isn't working with one of the multiprocessing threads.

yanirmr commented 3 years ago

@mmcauliffe Thank you for your response. Unfortunately, the problem didn't resolve. I get the same error:

Cleaning` old directory!
INFO - Setting up corpus information...
INFO - Found old run with 1 rather than the current 3, setting to 1.  If you would like to use 3, re-run the
                command with --clean.
INFO - Number of speakers in corpus: 1, average number of utterances per speaker: 1.0
INFO - Parsing dictionary "librispeech-lexicon" without pronunciation probabilities without silence
                probabilities
PronunciationAcousticMismatchError: There were phones in the dictionary that do not have acoustic models: AA0, AA1, AA2, AE0, AE1, AE2, AH0, AH1, AH2, AO0, AO1, AO2, AW0, AW1, AW2, AY0, AY1, AY2, B, CH, D, DH, EH0, EH1, EH2, ER0, ER
1, ER2, EY0, EY1, EY2, F, G, HH, IH0, IH1, IH2, IY0, IY1, IY2, JH, K, L, M, N, NG, OW0, OW1, OW2, OY0, OY1, OY2, P, R, S, SH, T, TH, UH0, UH1, UH2, UW0, UW1, UW2, V, W, Y, Z, and ZH
mmcauliffe commented 3 years ago

Hmm, odd, it says that it's cleaning the old directory, but it's loading from the old run still. Can you try it with --clean to see? I hate to lose the time processing it, but it might be the best to start from scratch.

Also, can you give me some details about the dataset? It looks like it's a single speaker with 13100 files, is that correct? Or should there be more speakers? MFA isn't super optimized for single speaker cases, but I'll see if I can replicate it this weekend. Any other details or data transformations you did?

yanirmr commented 3 years ago

Sorry, I ran it again with and without the clean flag. Take a look, please. (The dataset is from Librivox, a chapter of an audiobook, single speaker. the text is from the Gottenberg project)

(project-env) C:\...\myProject>mfa align datasets/gulliver_intro datasets/librispeech-lexicon.txt english outputs/aligned_gulliver_test
WARNING - WARNING: Using old temp directory, this might not be ideal for you, use the --clean flag to ensure no
                   weird behavior for previous versions of the temporary directory.
INFO - Setting up corpus information...
INFO - Found old run with 1 rather than the current 3, setting to 1.  If you would like to use 3, re-run the
                command with --clean.
INFO - Number of speakers in corpus: 1, average number of utterances per speaker: 1.0
INFO - Parsing dictionary "librispeech-lexicon" without pronunciation probabilities without silence
                probabilities
PronunciationAcousticMismatchError: There were phones in the dictionary that do not have acoustic models: AA0, AA1, AA2
, AE0, AE1, AE2, AH0, AH1, AH2,
 AO0, AO1, AO2, AW0, AW1, AW2, AY0, AY1, AY2, B, CH, D, DH, EH0, EH1, EH2, ER0, ER
PronunciationAcousticMismatchError: There were phones in the dictionary that  do not have acoustic models: AA0, AA1
, AA2, AE0, AE1, AE2, AH0, AH1, AH2, AO0, AO1, AO2, AW0, AW1, AW2, AY0, AY1, AY2, B, CH, D, DH, EH0, EH1, EH2, ER0
, ER1, ER2, EY0, EY1, EY2, F, G, HH, IH0, IH1, IH2, IY0, IY1, IY2, JH, K, L, M, N, NG, OW0, OW1, OW2, OY0, OY1, OY
2, P, R, S, SH, T, TH, UH0, UH1, UH2, UW0, UW1, UW2, V, W, Y, Z, and ZH

(project-env) C:\...\myProject>mfa align datasets/gulliver_intro datasets/librispeech-lexicon.txt english outputs/aligned_gulliver_test --clean
Cleaning old directory!
INFO - Setting up corpus information...
INFO - Found old run with 1 rather than the current 3, setting to 1.  If you
 would like to use 3, re-run the command with --clean.
INFO - Number of speakers in corpus: 1, average number of utterances per speaker: 1.0 
INFO - Parsing dictionary "librispeech-lexicon" without pronunciation probabilities without silence probabilities
PronunciationAcousticMismatchError: There were phones in the dictionary that  do not have acoustic models: AA0, AA1
, AA2, AE0, AE1, AE2, AH0, AH1, AH2, AO0, AO1, AO2, AW0, AW1, AW2, AY0, AY1, AY2, B, CH, D, DH, EH0, EH1, EH2, ER0
, ER1, ER2, EY0, EY1, EY2, F, G, HH, IH0, IH1, IH2, IY0, IY1, IY2, JH, K, L, M, N, NG, OW0, OW1, OW2, OY0, OY1, OY
2, P, R, S, SH, T, TH, UH0, UH1, UH2, UW0, UW1, UW2, V, W, Y, Z, and ZH
mmcauliffe commented 3 years ago

Maybe try manually deleting the ~/Documents/MFA/gulliver_intro folder? The line:

INFO - Found old run with 1 rather than the current 3, setting to 1.  If you
 would like to use 3, re-run the command with --clean.

Only gets output when it's loading the corpus from temporary files, which makes me think that the --clean wasn't successful for some reason and might be the root cause?

yanirmr commented 2 years ago

This problem occurs whenever I attempt to run MFA on data that has already been processed by MFA. The only solution that works for me is to copy all of the data to a different path or to rename the data path.