MontrealCorpusTools / Montreal-Forced-Aligner

Command line utility for forced alignment using Kaldi
https://montrealcorpustools.github.io/Montreal-Forced-Aligner/
MIT License
1.34k stars 247 forks source link

Forced alignment in a native application #530

Open DanielSWolf opened 1 year ago

DanielSWolf commented 1 year ago

I'm writing a native application (Rust, C/C++) that needs to perform forced alignment. Not training, just the alignment part. So I'm wondering how best to integrate MFA into an application.

My understanding is that most of MFA's code is for training. Once the models exist, the align command seems to primarily call a number of Kaldi binaries, which in turn are just thin wrappers around the Kaldi library. So my idea is to compile Kaldi as part of my application, then call the appropriate Kaldi functionality directly. This way, the entire forced alignment functionality could reside in my application's executable.

So I'm wondering:

  1. Does this sound like a reasonable approach?
  2. If so, is there some documentation on what Kaldi binaries MFA calls during alignment? Otherwise, I'd reconstruct that sequence from the source code.
mmcauliffe commented 1 year ago

Yeah that sounds reasonable. The kaldi feedstock on conda-forge should have shared objects/dll/dylibs that can be linked when you're building (including the OpenFst .16.so libraries) and they should have all the symbols exported that you need (at least they export the ones that the binaries use).

in terms of alignment, there is some documentation here with source code: https://montreal-forced-aligner.readthedocs.io/en/latest/reference/alignment/generated/montreal_forced_aligner.alignment.mixins.AlignMixin.html#montreal_forced_aligner.alignment.mixins.AlignMixin.align_utterances, but in general, the alignment functions just follow https://github.com/kaldi-asr/kaldi/tree/master/egs/wsj/s5/steps/align_si.sh and https://github.com/kaldi-asr/kaldi/tree/master/egs/wsj/s5/steps/align_fmllr.sh.

BarryKCL commented 1 year ago

any progress? I use python source code instead of command_line to forced alignment.But the alignment speed is still slow,due to the Kaldi style feature.

DanielSWolf commented 1 year ago

@mmcauliffe Thank you for the Kaldi recipes and for the tip with the pre-built binaries!

@BarryKCL This is for a personal project of mine, and I don't have much free time at the moment. I'll certainly update this issue once I've made progress, but that may take some time.

BarryKCL commented 1 year ago

I find a implementation in: https://github.com/open-speech/speech-aligner