MontrealCorpusTools / Montreal-Forced-Aligner

Command line utility for forced alignment using Kaldi
https://montrealcorpustools.github.io/Montreal-Forced-Aligner/
MIT License
1.31k stars 244 forks source link

[BUG] ngramcount FATAL: SetFlags: Bad option: --require_symbols #764

Closed AlienKevin closed 7 months ago

AlienKevin commented 7 months ago

Debugging checklist

[x] Have you read the troubleshooting page (https://montreal-forced-aligner.readthedocs.io/en/latest/user_guide/troubleshooting.html) and searched the documentation to ensure that your issue is not addressed there? [x] Have you updated to latest MFA version? What is the output of mfa version? [x] Have you tried rerunning the command with the --clean flag?

Describe the issue Setup:

conda create -n aligner3 -c conda-forge montreal-forced-aligner=3.0.0
conda activate aligner3

For MFA 3.0.0, running a normal validate command throws an error during phone LM training. Command:

mfa validate train_wavs lexicon.txt --clean

Error:

Error in atexit._run_exitfuncs:
Traceback (most recent call last):
  File "/Users/kevin/miniconda3/envs/aligner3/lib/python3.9/site-packages/montreal_forced_aligner/abc.py", line 102, in run
    self._run()
  File "/Users/kevin/miniconda3/envs/aligner3/lib/python3.9/site-packages/montreal_forced_aligner/language_modeling/multiprocessing.py", line 241, in _run
    self.check_call(ngramcount_proc)
  File "/Users/kevin/miniconda3/envs/aligner3/lib/python3.9/site-packages/montreal_forced_aligner/abc.py", line 129, in check_call
    raise KaldiProcessingError([self.log_path])
montreal_forced_aligner.exceptions.KaldiProcessingError: KaldiProcessingError:

There were 1 job(s) with errors when running Kaldi binaries.
See the log files below for more information.
/Users/kevin/Documents/MFA/train_wavs/monophone_ali/log/ngram_count.5.log

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/kevin/miniconda3/envs/aligner3/bin/mfa", line 10, in <module>
    sys.exit(mfa_cli())
  File "/Users/kevin/miniconda3/envs/aligner3/lib/python3.9/site-packages/click/core.py", line 1157, in __call__
    return self.main(*args, **kwargs)
  File "/Users/kevin/miniconda3/envs/aligner3/lib/python3.9/site-packages/rich_click/rich_command.py", line 126, in main
    rv = self.invoke(ctx)
  File "/Users/kevin/miniconda3/envs/aligner3/lib/python3.9/site-packages/click/core.py", line 1688, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/Users/kevin/miniconda3/envs/aligner3/lib/python3.9/site-packages/click/core.py", line 1434, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/Users/kevin/miniconda3/envs/aligner3/lib/python3.9/site-packages/click/core.py", line 783, in invoke
    return __callback(*args, **kwargs)
  File "/Users/kevin/miniconda3/envs/aligner3/lib/python3.9/site-packages/click/decorators.py", line 33, in new_func
    return f(get_current_context(), *args, **kwargs)
  File "/Users/kevin/miniconda3/envs/aligner3/lib/python3.9/site-packages/montreal_forced_aligner/command_line/validate.py", line 112, in validate_corpus_cli
    validator.validate()
  File "/Users/kevin/miniconda3/envs/aligner3/lib/python3.9/site-packages/montreal_forced_aligner/validation/corpus_validator.py", line 512, in validate
    self.train()
  File "/Users/kevin/miniconda3/envs/aligner3/lib/python3.9/site-packages/montreal_forced_aligner/acoustic_modeling/trainer.py", line 559, in train
    self.finalize_training()
  File "/Users/kevin/miniconda3/envs/aligner3/lib/python3.9/site-packages/montreal_forced_aligner/acoustic_modeling/trainer.py", line 628, in finalize_training
    self.train_phone_lm()
  File "/Users/kevin/miniconda3/envs/aligner3/lib/python3.9/site-packages/montreal_forced_aligner/transcription/transcriber.py", line 317, in train_phone_lm
    raise v
  File "/Users/kevin/miniconda3/envs/aligner3/lib/python3.9/site-packages/montreal_forced_aligner/utils.py", line 540, in run
    self.function.run()
  File "/Users/kevin/miniconda3/envs/aligner3/lib/python3.9/site-packages/montreal_forced_aligner/abc.py", line 106, in run
    raise MultiprocessingError(self.job_name, error_text)
montreal_forced_aligner.exceptions.MultiprocessingError: MultiprocessingError:

Job 5 encountered an error:
Traceback (most recent call last):

  File "/Users/kevin/miniconda3/envs/aligner3/lib/python3.9/site-packages/montreal_forced_aligner/abc.py", line 102, in run
    self._run()

  File "/Users/kevin/miniconda3/envs/aligner3/lib/python3.9/site-packages/montreal_forced_aligner/language_modeling/multiprocessing.py", line 241, in _run
    self.check_call(ngramcount_proc)

  File "/Users/kevin/miniconda3/envs/aligner3/lib/python3.9/site-packages/montreal_forced_aligner/abc.py", line 129, in check_call
    raise KaldiProcessingError([self.log_path])

montreal_forced_aligner.exceptions.KaldiProcessingError: KaldiProcessingError:

There were 1 job(s) with errors when running Kaldi binaries.
See the log files below for more information.
/Users/kevin/Documents/MFA/train_wavs/monophone_ali/log/ngram_count.5.log

The log file says: FATAL: SetFlags: Bad option: --require_symbols=false

I traced the error to language_modeling/multiprocessing:

ngramcount_proc = subprocess.Popen(
                [
                    thirdparty_binary("ngramcount"),
                    "--require_symbols=false",
                    "--round_to_int",
                    f"--order={self.order}",
                    "-",
                    self.working_directory.joinpath(f"{self.job_name}.cnts"),
                ],

I tried running ngramcount --help to see if the command option require_symbols exist, and found that it is indeed missing as the PROGRAM FLAGS is blank:

ngramcount --help                                                                                           
Count n-grams from input file.

  Usage: ngramcount [--options] [in.far [out.fst]]

PROGRAM FLAGS:

LIBRARY FLAGS:

Flags from: /Users/runner/miniforge3/conda-bld/openfst_1706373532578/work/src/lib/flags.cc
  --help: type = bool, default = false
  show usage information
  --helpshort: type = bool, default = false
  show brief usage information
  --tmpdir: type = std::string, default = "/var/folders/kk/n4ff6h1n3t170b1m4zv09yf40000gn/T/"
  temporary directory
  --v: type = int32_t, default = 0
  verbosity level
...

In comparison, the older MFA v3.0.0a8 that uses ngram=1.3.14 produces the following, expected output:

ngramcount --help
Count n-grams from input file.

  Usage: ngramcount [--options] [in.far [out.fst]]

PROGRAM FLAGS:

  --add_to_symbol_unigram_count: type = double, default = 0
  Adds this amount to the unigram count of each word in the symbol table
  --alpha: type = double, default = 1
  Weight for first FST
  --backoff_label: type = int64_t, default = 0
  Backoff label
  --beta: type = double, default = 1
  Weight for second (and subsequent) FST(s)
  --check_consistency: type = bool, default = false
  Check model consistency
  --context_pattern: type = std::string, default = ""
  Pattern of contexts to count
  --epsilon_as_backoff: type = bool, default = false
  Treat epsilon in the input Fsts as backoff
  --method: type = std::string, default = "counts"
  One of: "counts", "histograms", "count_of_counts", "count_of_histograms"
  --norm_eps: type = double, default = 0.001
  Normalization check epsilon
  --normalize: type = bool, default = false
  Normalize resulting model
  --order: type = int64_t, default = 3
  Set maximal order of ngrams to be counted
  --output_fst: type = bool, default = true
  Output counts as fst (otherwise strings)
  --require_symbols: type = bool, default = true
  Require symbol tables? (default: yes)
  --round_to_int: type = bool, default = false
  Round all counts to integers

LIBRARY FLAGS:

Flags from: /Users/runner/miniforge3/conda-bld/openfst_1659884124199/work/src/lib/flags.cc
  --help: type = bool, default = false
  show usage information
  --helpshort: type = bool, default = false
  show brief usage information
  --tmpdir: type = std::string, default = "/var/folders/kk/n4ff6h1n3t170b1m4zv09yf40000gn/T/"
  temporary directory
  --v: type = int32_t, default = 0
  verbosity level
...

Desktop (please complete the following information):

AlienKevin commented 7 months ago

I tested MFA 3.0.0 on Linux x86 and found that this is not an issue. It is odd that ngramcount is broken only on macOS it seems.

mmcauliffe commented 7 months ago

Can you check other ngram binaries to see if they're all suffering from lack of flags? ngrammake, ngrammerge, ngramsymbols, ngramshrink, etc.

AlienKevin commented 7 months ago

All the commands you listed (ngrammake, ngrammerge, ngramsymbols, ngramshrink) are missing program flags.

mmcauliffe commented 7 months ago

Any issues with the binaries should be fixed via conda update openfst ngram baumwelch now