MontrealCorpusTools / Montreal-Forced-Aligner

Command line utility for forced alignment using Kaldi
https://montrealcorpustools.github.io/Montreal-Forced-Aligner/
MIT License
1.29k stars 242 forks source link

Evaluation documentation read.the.docs missing information #823

Open chirila opened 1 month ago

chirila commented 1 month ago

Is your feature request related to a problem? Please describe.

I'm interested in comparing alignment methods for some data so the "alignment evaluation" option is a very welcome addition (thank you Michael!). However, the information about running it seems incomplete.

https://montreal-forced-aligner.readthedocs.io/en/latest/user_guide/implementations/alignment_evaluation.html

says "Alignments can be compared to a gold-standard reference set by specifying the --reference_directory below. MFA will load all TextGrids and parse them as if they were exported by MFA (i.e., phone and speaker tiers per speaker). "

But there's no information about the commands to run (nor in the all commands options that I can find). The phone_model_alignments (https://montreal-forced-aligner.readthedocs.io/en/latest/user_guide/implementations/phone_models.html#) refers to this method but doesn't have further information. But there's an option to specify a reference directory in mfa align; is that sufficient to trigger comparison?

Describe the solution you'd like Could you provide a sample command set (or link to where it's given in the docs)?

mmcauliffe commented 1 month ago

Right, I can expand out that documentation to point to the benchmarking scripts that I have in the mfa-models repo. It is just sufficient to supply the --reference_directory, as long as the phone set used in the reference directory is the same as the one used in the mfa align command. If not, then you'll want to construct a mapping file like mfa_buckeye_mapping.yaml.

In the meantime, you can see some examples of using the alignment evaluation in the scripts/alingnment_benchmarks folder of mfa-models, specifically benchmark_english_alignments.py. There's also example mapping files as necessary, along with data prep scripts for Buckeye, TIMIT, CSJ, and the Seoul Corpus to format their phone alignments into MFA format to be used with --reference_directory.

chirila commented 1 month ago

much appreciated!