MontrealCorpusTools / Montreal-Forced-Aligner

Command line utility for forced alignment using Kaldi
https://montrealcorpustools.github.io/Montreal-Forced-Aligner/
MIT License
1.27k stars 242 forks source link

About building Kaldi binaries for MFA #330

Open jichuanman opened 2 years ago

jichuanman commented 2 years ago

Hi, I am trying to get MFA work in my docker image(for some reason it must be based on ubuntu16.04). The prebuild binaries didn't work due to glibc version. So I am trying to build those binaries from scratch and it works. But after I remove the Kaldi directory from the image (keep the size of image as small as possible), MFA throws error like "No such file or directory: '/root/Documents/MFA/xxx/corpus_data/split1/feats.0.scp'". I figured that the binaries(like compute-mfcc-feats, gmm-align-compiled) were not working maybe due to lacking necessary libraries:

root@55f44fdffeb2:/app/core# ldd /root/Documents/MFA/thirdparty/bin/compute-mfcc-feats
        linux-vdso.so.1 =>  (0x00007ffe50036000)
        libkaldi-feat.so => not found
        libkaldi-util.so => not found
        libkaldi-matrix.so => not found
        libkaldi-base.so => not found
        libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007f364b000000)
        libstdc++.so.6 => /usr/lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007f364ac7e000)
        libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007f364aa68000)
        libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f364a69e000)
        /lib64/ld-linux-x86-64.so.2 (0x00007f364b21d000)
        libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f364a395000)
root@55f44fdffeb2:/app/core# ldd /root/Documents/MFA/thirdparty/bin/gmm-align-compiled
        linux-vdso.so.1 =>  (0x00007ffd3a87a000)
        libkaldi-decoder.so => not found
        libkaldi-hmm.so => not found
        libkaldi-gmm.so => not found
        libkaldi-util.so => not found
        libkaldi-matrix.so => not found
        libkaldi-base.so => not found
        libfst.so.16 => not found
        libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007fb3335c9000)
        libstdc++.so.6 => /usr/lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007fb333247000)
        libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007fb333031000)
        libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007fb332c67000)
        /lib64/ld-linux-x86-64.so.2 (0x00007fb3337e6000)
        libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007fb33295e000)

So I tried to add flags like "--static" to Kaldi configure script and build again. Problem solved but the size of collected binaries become so large(~1.4G). But the prebuilt ones are only ~300M.

I am wondering how to configure and build Kaldi correctly to get binaries like those I get via `mfa thirdparty download'. Any help or guidance is appreciated.

btw: Thanks for the great tool! MFA really helps me a lot in my tts project.

mmcauliffe commented 2 years ago

Oh man, right, I remember getting the massive binaries and there was indeed some flag that I had to change. I'll try to dig through my build directory and see if I have a record of it, but they should be static, except for openblas, iirc, but I don't remember what the flag. It was also done through the cmake style, I know that.

huydang2106 commented 1 year ago

Can you help me with building these Kaldi binaries, please.