alumae / kaldi-gstreamer-server

Real-time full-duplex speech recognition server, based on the Kaldi toolkit and the GStreamer framwork.
BSD 2-Clause "Simplified" License
1.07k stars 342 forks source link

Shared memory between workers. #167

Closed vzxxbacq closed 4 years ago

vzxxbacq commented 5 years ago

Hello @alumae . I'm trying to reduce the memory cost of workers. Is it possible to replace the APIs in the online-server-gmm-decode-faster.cc with sys/shm? In this way, I think workers can get HCLG from shared memory rather than disk.

gilamsalem commented 5 years ago

Hi @vzxxbacq I am trying to do the same.

What I currently did was to edit the kaldi online gstreamer plugin (gst-kaldi-nnet2-online) to keep the pointer to the fst object (HCLG), and share it between threads. I am not sure if this is the right thing to do, but it seemed to me like the easiest/fastest way, and it actually works.

It will be nice if you can share your experience/thoughts.

alumae commented 5 years ago

Hey @gilamsalem, I am really curious about your solution. When I run multliple workers, I understand that gstreamer plugin instances are loaded for each worker separately and hence cannot share memory. Am I wrong?

vzxxbacq commented 5 years ago

Hello @gilamsalem and @alumae , I am still trying and things don't work. But I'm very happy to share my experiences to you and hope it can help. I think the easiest way is overriding the function read FST from disk (fstext/kaldi-fst-io), and I choose the boost/interprocess library to manage the share-memory object. Here's my code:

// ReadFstKaldiGenericRaw is origin Read function.
Fst<StdArc> *ReadFstKaldiGeneric(std::string rxfilename, bool throw_on_err){
    namespace bip = boost::interprocess;
    try{
      //
      // use try catch block to judge whether shared memory exist, then read from disk
      // or create pointer point to the shared memory.
      // create_only will throw expectation, if shared memory is exist.
      // 
      bip::shared_memory_object shm(bip::create_only, "Model", bip::read_write);
      Fst<StdArc> *fst = fst::ReadFstKaldiGenericRaw(rxfilename, throw_on_err);
      shm.truncate(sizeof(*fst));
      bip::mapped_region shm_region(shm, bip::read_write);
      Fst<StdArc> *shm_fst = static_cast<Fst<StdArc>*>(shm_region.get_address());
      *shm_fst = *fst;
      delete fst;
      return shm_fst;
    } catch(bip::interprocess_exception &ex){
      std::cout << "Already create shared memory. Mapping from it." << std::endl;
      bip::shared_memory_object shm(bip::open_only, "Model", bip::read_write);
      bip::mapped_region shm_region(shm, bip::read_write);
      Fst<StdArc> *shm_fst = static_cast<Fst<StdArc>*>(shm_region.get_address());
      return shm_fst;
    }
 }

These code can read FST properly, but it will crush by segmentation fault when decode. I found the reason is dynamic memory allocate in stackoverflow. I'm not familiar with Openfst, so I got stuck at here and not sure how to deal with it now.

gilamsalem commented 5 years ago

Hi @alumae Considering the current design of the gstreamer kaldi server, where each worker is running in a different process, you are right. In my setup, I have a process, which instantiate X Java workers using X threads, and it seems to work. Honestly I havn't tested it deeply, but I can say that all workers return the final transcription, and that the memory usage was reduced by Gbs. I can share my plugin changes if you are interested.

BTW, have you ever considered to re-design the architecture, to have several workers running in the same process (something like workers group)?

gilamsalem commented 5 years ago

Hi @vzxxbacq I also thought about using the disk instead of memory. My first intuition was that it will slow the decoder, and will create a lot of IO to the disk (which might be harmful for hard drives).

vzxxbacq commented 5 years ago

Hi @gilamsalem , my solution is mapping the HCLG to shared memory and we only need to read it from disk once. However, when we copy the FST of our local memory to shared memory the program crushed because the copy constructor dynamic allocate memory which not on the shared memory we registed. But my solution seems impossible unless we change some code of Openfst.

YunzhaoLu commented 5 years ago

Hi, @gilamsalem I am very interested in your plugin to run multiple workers with shared memory. Would you please share it with me(yunzhao.lu@gmail.com)? Thank you!

rohithkodali commented 5 years ago

hi @gilamsalem i'm also interest in your solution and willing to collaborate with you on this aspect, is there any way i can get access to your repo or you can send me to my mail rohitkodali@gmail.com

gilamsalem commented 5 years ago

Hi @rohithkodali

I will try to summarize what I did there, but it's important to understand that for me, it was more like proof of concept, so I don't have a repo, and I didn't closed all the edge cases.

The first thing was to re-implement the worker to be more like a worker group, so instead of having a new process per worker (like now), I have 1 process with multiple workers, each in a different thread. I did it in Java, but you can easily do the same using the current python code.

The second thing was to do some kind of caching in the kaldi nnet2 online caching (the gstreamer plugin). So I added a map for model path to pinter. If a new worker thread is started with the same model path, I assign the same pointer to it, and don't reload the whole model from disk. I did the same for both model and fst attributes. This code is written in c++ and I can share it with you.

At that point I ran some tests with the original plugin (no caching) and the modified one (with caching) and I noticed that the memory was increased significantly.

abhiman84 commented 4 years ago

Hi @gilamsalem , It would be great if you could share your C++ code for a reference on abhiman.84@gmail.com.

lucgeo commented 4 years ago

Hello @gilamsalem , I have the same request, if you can share your code, please give me a message on lucgeo92@gmail.com. Thanks!

gilamsalem commented 4 years ago

Hi, Sorry for the late response. My files are attached. Enjoy!

gstkaldinnet2onlinedecoder.zip

lucgeo commented 4 years ago

@gilamsalem Thank you!

vzxxbacq commented 4 years ago

@gilamsalem Thank you, very helpful code. After looking into your code, I think anyone who wants to use your code should re-implement threaded worker first which hard to implement with Python due to the GIL.

shatealaboxiaowang commented 2 years ago

@gilamsalem Thank you for your sharing, but compiling your code failed, Could you help me analyze the problem? error info is:

         gstkaldinnet2onlinedecoder.cc: In function 'void kaldi::gst_kaldinnet2onlinedecoder_threaded_decode_segment(kaldi::Gstkaldinnet2onlinedecoder*, bool&, int32, kaldi::BaseFloat, kaldi::Vector<float>*)':

gstkaldinnet2onlinedecoder.cc:1295:68: error: no matching function for call to 'kaldi::SingleUtteranceNnet2DecoderThreaded::SingleUtteranceNnet2DecoderThreaded(kaldi::OnlineNnet2DecodingThreadedConfig&, kaldi::TransitionModel&, kaldi::nnet2::AmNnet&, fst::Fst<fst::ArcTpl<fst::TropicalWeightTpl > >&, kaldi::OnlineNnet2FeaturePipelineInfo&, kaldi::OnlineIvectorExtractorAdaptationState&)' *(filter->adaptation_state)); ^ In file included from ./gstkaldinnet2onlinedecoder.h:30:0, from gstkaldinnet2onlinedecoder.cc:50: /opt/kaldi/src/online2/online-nnet2-decoding-threaded.h:197:3: note: candidate: kaldi::SingleUtteranceNnet2DecoderThreaded::SingleUtteranceNnet2DecoderThreaded(const kaldi::OnlineNnet2DecodingThreadedConfig&, const kaldi::TransitionModel&, const kaldi::nnet2::AmNnet&, const fst::Fst<fst::ArcTpl<fst::TropicalWeightTpl > >&, const kaldi::OnlineNnet2FeaturePipelineInfo&, const kaldi::OnlineIvectorExtractorAdaptationState&, const kaldi::OnlineCmvnState&) SingleUtteranceNnet2DecoderThreaded( ^~~~~~~~~~~ /opt/kaldi/src/online2/online-nnet2-decoding-threaded.h:197:3: note: candidate expects 7 arguments, 6 provided

: recipe for target 'gstkaldinnet2onlinedecoder.o' failed make: *** [gstkaldinnet2onlinedecoder.o] Error 1 Thanks!
shatealaboxiaowang commented 2 years ago

@vzxxbacq @lucgeo Dear: Did you compile the code 'gstkaldinnet2onlinedecoder.zip' successfully ?? error occurred during compiling and error info is: gstkaldinnet2onlinedecoder.cc:1295:68: error: no matching function for call to 'kaldi::SingleUtteranceNnet2DecoderThreaded::SingleUtteranceNnet2DecoderThreaded(kaldi::OnlineNnet2DecodingThreadedConfig&, kaldi::TransitionModel&, kaldi::nnet2::AmNnet&, fst::Fst<fst::ArcTpl >&, kaldi::OnlineNnet2FeaturePipelineInfo&, kaldi::OnlineIvectorExtractorAdaptationState&)' *(filter->adaptation_state)); In file included from ./gstkaldinnet2onlinedecoder.h:30:0, from gstkaldinnet2onlinedecoder.cc:50: /opt/kaldi/src/online2/online-nnet2-decoding-threaded.h:197:3: note: candidate: kaldi::SingleUtteranceNnet2DecoderThreaded::SingleUtteranceNnet2DecoderThreaded(const kaldi::OnlineNnet2DecodingThreadedConfig&, const kaldi::TransitionModel&, const kaldi::nnet2::AmNnet&, const fst::Fst<fst::ArcTpl >&, const kaldi::OnlineNnet2FeaturePipelineInfo&, const kaldi::OnlineIvectorExtractorAdaptationState&, const kaldi::OnlineCmvnState&) SingleUtteranceNnet2DecoderThreaded( ^~~~~~~~~~~ /opt/kaldi/src/online2/online-nnet2-decoding-threaded.h:197:3: note: candidate expects 7 arguments, 6 provided : recipe for target 'gstkaldinnet2onlinedecoder.o' failed make: *** [gstkaldinnet2onlinedecoder.o] Error 1

vzxxbacq commented 2 years ago

@shatealaboxiaowang I have built his project successfully. I don't know why you got the error, it seems like a linking issue of the Kaldi library. try this https://github.com/vzxxbacq/kaldi-gst-nnet-plugin

shatealaboxiaowang commented 2 years ago

@shatealaboxiaowang I have built his project successfully. I don't know why you got the error, it seems like a linking issue of the Kaldi library. try this https://github.com/vzxxbacq/kaldi-gst-nnet-plugin

Thanks, compilation passed, but an error was reported when the worker was changed to multi-threaded. Have you ever encountered that? error info:

ERROR ([5.5.0~1-40c7]:ExpectToken():io-funcs.cc:200) Failed to read token [started at file position 0], expected [ Stack-Trace: ] /opt/kaldi/src/lib/libkaldi-base.so(kaldi::MessageLogger::LogMessage() const+0xa71) [0x7fc28af7150f] /opt/gst-kaldi-nnet2-online/src/libgstkaldinnet2onlinedecoder.so(kaldi::MessageLogger::LogAndThrow::operator=(kaldi::MessageLogger const&)+0x11) [0x7fc28d16c487] /opt/kaldi/src/lib/libkaldi-base.so(kaldi::ExpectToken(std::istream&, bool, char const*)+0x15e) [0x7fc28af72f52] /opt/kaldi/src/lib/libkaldi-online2.so(kaldi::OnlineIvectorExtractorAdaptationState::Read(std::istream&, bool)+0x1e) [0x7fc28ca94392] /opt/gst-kaldi-nnet2-online/src/libgstkaldinnet2onlinedecoder.so(+0x6c3f1) [0x7fc28d1623f1] /usr/lib/x86_64-linux-gnu/libgobject-2.0.so.0(g_object_set_property+0x20b) [0x7fc2922db18b] /usr/lib/python2.7/dist-packages/gi/_gi.x86_64-linux-gnu.so(+0x15dba) [0x7fc292961dba]

biruand1016 commented 1 year ago

Hi. I just read this and downloaded gstkaldinnet2onlinedecoder.zip. But I don't know how to compile it. There is no MakeFile and I have some trouble because of dismatch between this and Kaldi src for class SingleUtteranceNnet2DecoderThreaded. Please help me and let me know how to compile it. Thanks.

biruand1016 commented 1 year ago

Hi, @shatealaboxiaowang and @vzxxbacq. I tried to compile using https://github.com/vzxxbacq/kaldi-gst-nnet-plugin as well , but it was failed too. Please help me.