flashlight / wav2letter

Facebook AI Research's Automatic Speech Recognition Toolkit
https://github.com/facebookresearch/wav2letter/wiki
Other
6.37k stars 1.01k forks source link

Loading model files such as acoustic model | getting kenLM related issues for libraries #866

Open tumusudheer opened 3 years ago

tumusudheer commented 3 years ago

Question

I'm using Ubuntu 18.04 and using wav2letter v0.2 branch. I've successfully compile and built wav2letter on my machine. Now I'm working on building inference example as a standalone c++ file instead of wav2letter environment. My C++ code:

#include <stdio.h>
#include <stdlib.h>
#include <atomic>
#include <fstream>
#include <istream>
#include <iostream>
#include <ostream>
#include <sstream>
#include <string>
#include <vector>
//#include <pthread>

#include <cereal/archives/binary.hpp>
#include <cereal/archives/json.hpp>

#include "inference/decoder/Decoder.h"
#include "inference/examples/AudioToWords.h"
#include "inference/examples/Util.h"
#include "inference/examples/threadpool/ThreadPool.h"
#include "inference/module/feature/feature.h"
#include "inference/module/module.h"
#include "inference/module/nn/nn.h"

using namespace w2l;
using namespace w2l::streaming;

int main(int argc, char **argv)
{
    std::cout << "Hello!!!" << "\n";

    std::vector<std::string> inputFiles;
    std::string path = "/tmp/insert_brief_exam.wav";
    inputFiles.push_back(path);
    const size_t inputFileCount = inputFiles.size();
    std::cout << "Will process " << inputFileCount << " files." << std::endl;

    std::shared_ptr<streaming::Sequential> featureModule;
    std::shared_ptr<streaming::Sequential> acousticModule;

    // Read files
    {
        std::string feature_module_file = "Models/feature_extractor.bin";
        TimeElapsedReporter feturesLoadingElapsed("features model file loading");
        std::ifstream featFile(feature_module_file, std::ios::binary);
        if (!featFile.is_open()) {
          throw std::runtime_error(
              "failed to open feature file=" +
              feature_module_file + " for reading");
        }
        cereal::BinaryInputArchive ar(featFile);
        ar(featureModule);
    }

    {
        std::string acoustic_module_file = "Models/acoustic_model.bin";
        TimeElapsedReporter acousticLoadingElapsed("acoustic model file loading");
        std::ifstream amFile(acoustic_module_file, std::ios::binary);
        if (!amFile.is_open()) {
          throw std::runtime_error(
              "failed to open acoustic model file=" +
              acoustic_module_file + " for reading");
        }
        cereal::BinaryInputArchive ar(amFile);
        ar(acousticModule);
    }

     // String both modeles togthers to a single DNN.
    auto dnnModule = std::make_shared<streaming::Sequential>();
    dnnModule->add(featureModule);
    dnnModule->add(acousticModule);

    std::string token_file = "Models/tokens.txt";
    std::string lexicon_file = "Models/lexicon.txt";
    std::string language_model_file = "Models/language_model.bin";
    std::string silence_token = "_";
    std::vector<std::string> tokens;
    {
        TimeElapsedReporter acousticLoadingElapsed("tokens file loading");
        std::ifstream tknFile(token_file);
        if (!tknFile.is_open()) {
          throw std::runtime_error(
              "failed to open tokens file=" +
              token_file + " for reading");
        }
        std::string line;
        while (std::getline(tknFile, line)) {
          tokens.push_back(line);
        }
    }
    int nTokens = tokens.size();
    std::cout << "Tokens loaded - " << nTokens << " tokens" << std::endl;

    DecoderOptions decoderOptions;
    {
        std::string decoder_options_file = "Models/decoder_options.json";
        TimeElapsedReporter decoderOptionsElapsed("decoder options file loading");
        std::ifstream decoderOptionsFile(decoder_options_file);
        if (!decoderOptionsFile.is_open()) {
          throw std::runtime_error(
              "failed to open decoder options file=" +
              decoder_options_file + " for reading");
        }
        cereal::JSONInputArchive ar(decoderOptionsFile);
        // TODO: factor out proper serialization functionality or Cereal
        // specialization.
        ar(cereal::make_nvp("beamSize", decoderOptions.beamSize),
           cereal::make_nvp("beamSizeToken", decoderOptions.beamSizeToken),
           cereal::make_nvp("beamThreshold", decoderOptions.beamThreshold),
           cereal::make_nvp("lmWeight", decoderOptions.lmWeight),
           cereal::make_nvp("wordScore", decoderOptions.wordScore),
           cereal::make_nvp("unkScore", decoderOptions.unkScore),
           cereal::make_nvp("silScore", decoderOptions.silScore),
           cereal::make_nvp("eosScore", decoderOptions.eosScore),
           cereal::make_nvp("logAdd", decoderOptions.logAdd),
           cereal::make_nvp("criterionType", decoderOptions.criterionType));
    }

    std::vector<float> transitions;
    std::string architecture_file = "";//"Models/tds_streaming.arch";
    if (!architecture_file.empty())
    {
        TimeElapsedReporter acousticLoadingElapsed("transitions file loading");
        std::ifstream transitionsFile(architecture_file, std::ios::binary);
        if (!transitionsFile.is_open()) {
          throw std::runtime_error(
              "failed to open transition parameter file=" +
              architecture_file + " for reading");
        }
        cereal::BinaryInputArchive ar(transitionsFile);
        ar(transitions);
    }

    std::shared_ptr<const DecoderFactory> decoderFactory;
    // Create Decoder
    {
        TimeElapsedReporter acousticLoadingElapsed("create decoder");
        decoderFactory = std::make_shared<DecoderFactory>(
            token_file,
            lexicon_file,
            language_model_file,
            transitions,
            SmearingMode::MAX,
            silence_token,
            0);
    }

    return 0;
}

And This is my using g++ to compile my main file:

g++ '--std=c++1z' -Wall -Wno-sign-compare -Wno-misleading-indentation -O3 src/main.cpp -o bin/main.out -I include -I external/cereal/src/cereal/include -I external/wav2letter/include -I/data/Self/facebook/standalone/external/wav2letter/include/inference/module/fbgemm/src/fbgemm/include/ -I/data/Self/facebook/standalone/external/wav2letter/include/inference/module/fbgemm/src/fbgemm/third_party/cpuinfo/include/ -I /opt/intel/mkl/include/ -L external/wav2letter/lib/ -L/opt/intel/mkl/lib/intel64/ -L/data/Self/facebook/standalone/external/kenlm/lib/ -lwav2letter++ -lwav2letter-inference -lstreaming_inference_common -lstreaming_inference_modules_nn_backend -lstreaming_inference_modules_feature -lstreaming_inference_modules_nn_impl -lwav2letter-libraries -lutil_example -laudio_to_words_example -lclog -lcpuinfo_internals -lcpuinfo -lasmjit -lmkl_gf_lp64 -lmkl_gnu_thread -lmkl_core -lpthread -lcublas -lm -fopenmp -lfftw3 -lfbgemm -lstreaming_inference_common -lopenblas -llapack -lm -ldl

I was able to load acostic model, decoder params and token files as well, but when I'm trying to load my decoder, I'm facing I'm getting the following errors while running my g++ command:

external/wav2letter/lib//libwav2letter-inference.a(KenLM.cpp.o): In function `w2l::KenLM::KenLM(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, w2l::Dictionary const&)':
KenLM.cpp:(.text+0x118): undefined reference to `lm::ngram::Config::Config()'
KenLM.cpp:(.text+0x125): undefined reference to `lm::ngram::LoadVirtual(char const*, lm::ngram::Config const&, lm::ngram::ModelType)'
collect2: error: ld returned 1 exit status

My kenlm build directory has the following files in the build directory:

libkenlm.a
 libkenlm_builder.a
libkenlm_filter.a
 libkenlm_util.a

With a bit of searching, I tried to add -lkenlm_util -lkenlm -lkenlm_builder -lkenlm_filter to my g++ command but that is giving lot more issues. May I know if I'm doing something wrong ?

Thank you

tumusudheer commented 3 years ago

I was able compile the code with the following g++ command and able to run inference on a wav file:

g++ '--std=c++1z' -Wall -Wno-sign-compare -Wno-misleading-indentation -O3 src/main.cpp -o bin/main.out -I include -I external/cereal/src/cereal/include -I external/wav2letter/include -I/data/Self/maneesh/facebook/standalone/external/wav2letter/include/inference/module/fbgemm/src/fbgemm/include/ -I/data/Self/maneesh/facebook/standalone/external/wav2letter/include/inference/module/fbgemm/src/fbgemm/third_party/cpuinfo/include/ -I /opt/intel/mkl/include/ -L external/wav2letter/lib/ -L/opt/intel/mkl/lib/intel64/ -L/data/Self/maneesh/facebook/standalone/external/kenlm/lib/ -lwav2letter++ -lwav2letter-inference -lstreaming_inference_common -lstreaming_inference_modules_nn_backend -lstreaming_inference_modules_feature -lstreaming_inference_modules_nn_impl -lwav2letter-libraries -lutil_example -laudio_to_words_example -lclog -lcpuinfo_internals -lcpuinfo -lasmjit -lmkl_gf_lp64 -lmkl_gnu_thread -lmkl_core -lpthread -lcublas -lm -fopenmp -lfftw3 -lfbgemm -lstreaming_inference_common -lopenblas -llapack -lkenlm_filter -lkenlm_builder -lkenlm -lkenlm_util -llzma -lbz2 -lz -lm -ldl

This will help to build the wav2letter inference code as a standalone applications.

Also A quick question: My machine is Ubuntu 18.04, should I use -lmkl_gnu_thread or -lmkl_intel_thread

tetiana-myronivska commented 3 years ago

Hi @tumusudheer,

This is great stuff that you posted! If I understand correctly, you are breaking the code in SimpleStreamingASRExample.cpp into two parts: first, loading the model (your main file above), and second, doing the inference itself audioStreamToWordsStream(). If that's the case, could you please share how you do the inference part?

I am currently working on a similar problem: using the streaming inference code, only in a python environment.

tumusudheer commented 3 years ago

Hi @tetiana-myronivska ,

Thank you. You are correct, my intention is to divide the initialization part separate and ( and should be execute only at the beginning of the stack), and inference part. I've not started implementing but if you paste this part or similar code from SimpleStreamingASRExample.cpp, The code should get compiled and you should be able to run the usual inference example code provided the wav2letter team.