Is it possible to use trained models without the entire MFA toolkit?

rbracco commented 1 year ago

Is your feature request related to a problem? Please describe.

I would like to try using a trained model to generate alignments on the fly on a server without adding the complications of Conda to my deployment workflow. Is there any way to export a g2p or alignment model to be called from python without all the dependencies? If there a simpler way to setup MFA without conda?

Related issues: #40 #186 #530

mmcauliffe commented 1 year ago

The issue is that MFA is basically a wrapper around Kaldi/OpenFST, so those dependencies would have to get installed somehow (see https://montreal-forced-aligner.readthedocs.io/en/latest/installation.html#installing-from-source). I am working on side project to get proper python bindings for Kaldi so that MFA doesn't have to rely on temporary files and calling Kaldi binaries, but rather just using python calls. With that said, I think conda is pretty necessary for handling all of the non-python dependencies that MFA has.

rbracco commented 1 year ago

Thank you for the clarification, this is exactly what I needed to know. Is there any way to stay current on the progress of the python bindings project?

rbracco commented 1 year ago

I have a few related followup questions (and then plan to the close the issue). I was wondering...

is there a way to load the model in a python process so that it can be used to align new audios on the fly (as they are recorded) instead of having to repeat the entire process from scratch (model loading...etc) by calling it from the command line?
Is this what you're working on with the python bindings or is this functionality already available?
Finally, is your python bindings project available anywhere so we can see progress? If it's private, is there a way to be alerted when it becomes publicly available?

Thank you.

mmcauliffe commented 1 year ago

Yeah, that would be functionality as part of the bindings, at the moment in MFA, all acoustic models are loaded by various Kaldi binaries, and it's only G2P models that are loaded in Python.

The bindings that I've been working on are here: https://github.com/mmcauliffe/kalpy, definitely not fully featured yet, but I have low-level bindings for most of the Kaldi codebase working, and I'm working on getting more pythonic interface code set up. Currently I just have code for generating MFCCs (https://github.com/mmcauliffe/kalpy/blob/main/tests/test_mfcc.py) and compiling utterance graphs (see https://github.com/mmcauliffe/kalpy/blob/main/tests/test_decoder.py).

I'm not sure exactly when I'll have it stable enough to release to pip/conda forge, but I think once I finish up the last of the low level bindings and do some performance benchmarking of the MFCC and training graph compilation to make sure I'm not doing anything horrendously wrong for memory or speed, then I'll feel comfortable getting a release out.

rbracco commented 1 year ago

Thank you for taking the time to respond, that is very helpful. Starred and following kalpy!

MontrealCorpusTools / Montreal-Forced-Aligner

Is it possible to use trained models without the entire MFA toolkit? #621