alphacep / vosk-api

Offline speech recognition API for Android, iOS, Raspberry Pi and servers with Python, Java, C# and Node
Apache License 2.0
7.48k stars 1.05k forks source link

VoskRecognizer is not thread safe #606

Closed Abdullah-AlAttar closed 3 years ago

Abdullah-AlAttar commented 3 years ago

Here is the code I was working on. I get a SIGSEGV randomly (does not always happen, try running multiple times) testing on "vosk-model-en-us-aspire-0.2" model and test16k.wav
Note: make sure the loop count(20) is more the number of threads on your machine (will help reproducing the issue faster)


#include "vosk_api.h"
#include <stdio.h>

int main(int argc, char const *argv[])
{

    VoskModel *model = vosk_model_new(argv[1]);
    // vosk_recognizer_new(model, 16000.0);
    // vosk_recognizer_new(model, 16000.0);
    // vosk_recognizer_new(model, 16000.0);
#pragma omp parallel for
    for (size_t i = 0; i < 20; i++)
    {
        FILE *wavin;
        char buf[3200];
        int nread, final;
        VoskRecognizer *recognizer = vosk_recognizer_new(model, 16000.0);

        /* code */
        wavin = fopen(argv[2], "rb");
        fseek(wavin, 44, SEEK_SET);
        while (!feof(wavin))
        {
            nread = fread(buf, 1, sizeof(buf), wavin);
            final = vosk_recognizer_accept_waveform(recognizer, buf, nread);
            if (final)
            {
                vosk_recognizer_result(recognizer);
                // printf("%s\n", vosk_recognizer_result(recognizer));
            }
            else
            {
                vosk_recognizer_partial_result(recognizer);
                // printf("%s\n", vosk_recognizer_partial_result(recognizer));
            }
        }
        // printf("%s\n", vosk_recognizer_final_result(recognizer));
        vosk_recognizer_final_result(recognizer);
        vosk_recognizer_free(recognizer);

        fclose(wavin);
    }
    vosk_model_free(model);
    return 0;
}
// './vosk_test /app/vosk-model-e...' terminated by signal SIGSEGV (Address boundary error)

and here is the MakeFile

KALDI_ROOT?=$(HOME)/travis/kaldi
OPENFST_ROOT?=$(KALDI_ROOT)/tools/openfst
OPENBLAS_ROOT?=$(KALDI_ROOT)/tools/OpenBLAS/install
HAVE_CUDA?=0
CUDA_ROOT?=/usr/local/cuda
EXT?=so
CXX?=g++
HAVE_OPENBLAS_CLAPACK=1
HAVE_ACCELERATE=0
EXTRA_CFLGAS?=
EXTRA_LDFLAGS?=

VOSK_SOURCES= \
    kaldi_recognizer.cc \
    language_model.cc \
    model.cc \
    spk_model.cc \
    vosk_api.cc

CFLAGS=-g -O0 -std=c++17 -fPIC -DFST_NO_DYNAMIC_LINKING $(EXTRA_CFLAGS) \
    -I. -I$(KALDI_ROOT)/src -I$(OPENFST_ROOT)/include -I$(OPENBLAS_ROOT)/include

LIBS= \
    $(KALDI_ROOT)/src/online2/kaldi-online2.a \
    $(KALDI_ROOT)/src/decoder/kaldi-decoder.a \
    $(KALDI_ROOT)/src/ivector/kaldi-ivector.a \
    $(KALDI_ROOT)/src/gmm/kaldi-gmm.a \
    $(KALDI_ROOT)/src/nnet3/kaldi-nnet3.a \
    $(KALDI_ROOT)/src/tree/kaldi-tree.a \
    $(KALDI_ROOT)/src/feat/kaldi-feat.a \
    $(KALDI_ROOT)/src/lat/kaldi-lat.a \
    $(KALDI_ROOT)/src/lm/kaldi-lm.a \
    $(KALDI_ROOT)/src/rnnlm/kaldi-rnnlm.a \
    $(KALDI_ROOT)/src/hmm/kaldi-hmm.a \
    $(KALDI_ROOT)/src/transform/kaldi-transform.a \
    $(KALDI_ROOT)/src/cudamatrix/kaldi-cudamatrix.a \
    $(KALDI_ROOT)/src/matrix/kaldi-matrix.a \
    $(KALDI_ROOT)/src/fstext/kaldi-fstext.a \
    $(KALDI_ROOT)/src/util/kaldi-util.a \
    $(KALDI_ROOT)/src/base/kaldi-base.a \
    $(OPENFST_ROOT)/lib/libfst.a \
    $(OPENFST_ROOT)/lib/libfstngram.a

ifeq ($(HAVE_OPENBLAS_CLAPACK), 1)
LIBS += \
    $(OPENBLAS_ROOT)/lib/libopenblas.a \
    $(OPENBLAS_ROOT)/lib/liblapack.a \
    $(OPENBLAS_ROOT)/lib/libblas.a \
    $(OPENBLAS_ROOT)/lib/libf2c.a
endif

ifeq ($(HAVE_ACCELERATE), 1)
LIBS += \
    -framework Accelerate
endif

ifeq ($(HAVE_CUDA), 1)
CFLAGS+=-DHAVE_CUDA=1 -I$(CUDA_ROOT)/include
LIBS+=-L$(CUDA_ROOT)/lib64 -lcublas -lcusparse -lcudart -lcurand -lcufft -lcusolver -lnvToolsExt
endif

all: libvosk.$(EXT) vosk_test

libvosk.$(EXT): $(VOSK_SOURCES:.cc=.o)
    $(CXX) --shared -s -o $@ $^ $(LIBS) -lm -latomic $(EXTRA_LDFLAGS)

vosk_test: libvosk.$(EXT) main.cpp
    $(CXX) -L. -lvosk -ldl -fopenmp main.cpp -o $@ 
%.o: %.cc
    $(CXX) $(CFLAGS) -c -o $@ $<

clean:
    rm -f *.o *.so *.dll

My guess the issue is coming from the Recognizer increasing and decreasing the Ref Count of the Model. I think it should not be doing that, because it might lead to having multiple Recognizers calling model->UnRef() and possibly deleting the model when we are not done

nshmyrev commented 3 years ago

Great catch!

What is your proposal to deal with this? We can just introduce a mutex.

Abdullah-AlAttar commented 3 years ago

Or the KaldiRecognizer should not be responsible for the memory management of the Model and SpkModel? actually I don't get what's the point of having each KaldiRecognizer Increasing and Decreasing the Ref counts of the Model. The user should be responsible for the memory management of the Model. A mutex would work I guess. but in my case I simply removed this code from the ~KaldiRecognizer()

   model_->Unref();
    if (spk_model_)
         spk_model_->Unref()

since I only need to release the model at the the end of my program.

nshmyrev commented 3 years ago

The user should be responsible for the memory management of the Model.

Its not the case for bindings for languages with GC (Python, Java) where you can't really control the destruction of the object.

Abdullah-AlAttar commented 3 years ago

Sorry for the misunderstanding/miss-wording.

   def __del__(self):
        _c.vosk_model_free(self._handle)

The is done in the bindings, which is sufficient in my opinion. What I mean is make this the only way to destruct the model. (Remove the reference counting of the model inside the Recognizer) I am not sure if this will cause any issues with the other bindings or any issues in general, since I didn't test it. Edit : I might be completely wrong. and maybe this is not even the reason for the crash.

nshmyrev commented 3 years ago

Please test the commit above and close the issue if fixed.

Abdullah-AlAttar commented 3 years ago

Just tested, not crashing anymore.

javide commented 2 years ago

@Abdullah-AlAttar @nshmyrev From the release note I can see this fix has been included in v0.3.30, but the vosk-0.3.30.jar was built a couple of weeks earlier: https://alphacephei.com/maven/com/alphacephei/vosk/0.3.30/ Is there any chance you can make a new build to include this fix?

nshmyrev commented 2 years ago

@javide I've just updated the jars, please clean cache and try again.

javide commented 2 years ago

@nshmyrev Thank you for the prompt update. I cleared the cache, but I can still reproduce this issue:

# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGSEGV (0xb) at pc=0x000000012a92e50f, pid=98332, tid=28675
#
# JRE version: OpenJDK Runtime Environment (11.0.2+9) (build 11.0.2+9)
# Java VM: OpenJDK 64-Bit Server VM (11.0.2+9, mixed mode, tiered, compressed oops, g1 gc, bsd-amd64)
# Problematic frame:
# C  [jna6688722770364858760.tmp+0x9e50f]  _ZN5ModelD2Ev+0xeabf
#
# No core dump will be written. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again
#
# If you would like to submit a bug report, please visit:
#   http://bugreport.java.com/bugreport/crash.jsp
# The crash happened outside the Java Virtual Machine in native code.
# See problematic frame for where to report the bug.
#

---------------  S U M M A R Y ------------

Command Line: -Xss512m -Xss512m -Dfile.encoding=UTF-8 -Duser.country=GB -Duser.language=en -Duser.variant au.com.realthing.kaldi.Main

Host: MacBookPro14,3 x86_64 3100 MHz, 8 cores, 16G, Darwin 20.6.0
Time: Mon Aug  9 10:46:29 2021 AEST elapsed time: 3862 seconds (0d 1h 4m 22s)

---------------  T H R E A D  ---------------

Current thread (0x00007fe49daf3800):  JavaThread "pool-1-thread-10" [_thread_in_native, id=28675, stack(0x000070022d5bf000,0x000070024d5bf000)]

Stack: [0x000070022d5bf000,0x000070024d5bf000],  sp=0x000070024d5bd530,  free space=524281k
Native frames: (J=compiled Java code, A=aot compiled Java code, j=interpreted, Vv=VM code, C=native code)
C  [jna6688722770364858760.tmp+0x9e50f]  _ZN5ModelD2Ev+0xeabf
C  [jna6688722770364858760.tmp+0x9e129]  _ZN5ModelD2Ev+0xe6d9
C  [jna6688722770364858760.tmp+0x9dd75]  _ZN5ModelD2Ev+0xe325
C  [jna6688722770364858760.tmp+0x39bc5]  _ZN15KaldiRecognizer5ResetEv+0x2c345
C  [jna6688722770364858760.tmp+0x3996d]  _ZN15KaldiRecognizer5ResetEv+0x2c0ed
C  [jna6688722770364858760.tmp+0x395ed]  _ZN15KaldiRecognizer5ResetEv+0x2bd6d
C  [jna6688722770364858760.tmp+0x3937d]  _ZN15KaldiRecognizer5ResetEv+0x2bafd
C  [jna6688722770364858760.tmp+0x38efa]  _ZN15KaldiRecognizer5ResetEv+0x2b67a
C  [jna6688722770364858760.tmp+0x369f8]  _ZN15KaldiRecognizer5ResetEv+0x29178
C  [jna6688722770364858760.tmp+0x3d01f]  _ZN15KaldiRecognizer5ResetEv+0x2f79f
C  [jna6688722770364858760.tmp+0x3e138]  _ZN15KaldiRecognizer5ResetEv+0x308b8
C  [jna6688722770364858760.tmp+0x3d2a5]  _ZN15KaldiRecognizer5ResetEv+0x2fa25
C  [jna6688722770364858760.tmp+0xcefbb]  _ZN5kaldi23LatticeFasterDecoderTplIN3fst3FstINS1_6ArcTplINS1_17TropicalWeightTplIfEEEEEENS_7decoder16BackpointerTokenEE18ProcessNonemittingEf+0x26b
C  [jna6688722770364858760.tmp+0xcd323]  _ZN5kaldi23LatticeFasterDecoderTplIN3fst3FstINS1_6ArcTplINS1_17TropicalWeightTplIfEEEEEENS_7decoder16BackpointerTokenEE15AdvanceDecodingEPNS_18DecodableInterfaceEi+0x133
C  [jna6688722770364858760.tmp+0x4d9d]  _ZN15KaldiRecognizer14AcceptWaveformERN5kaldi6VectorIfEE+0xfd
C  [jna6688722770364858760.tmp+0x4be5]  _ZN15KaldiRecognizer14AcceptWaveformEPKci+0xe5
C  [jna6688722770364858760.tmp+0xaf5c9]  vosk_recognizer_accept_waveform+0x9
C  [jna14115690028186303728.tmp+0xf11a]  ffi_prep_go_closure+0x54a

Java frames: (J=compiled Java code, j=interpreted, Vv=VM code)
J 1282  org.vosk.LibVosk.vosk_recognizer_accept_waveform(Lcom/sun/jna/Pointer;[BI)Z (0 bytes) @ 0x0000000115ccf636 [0x0000000115ccf5c0+0x0000000000000076]
J 2633 c2 au.com.realthing.kaldilibrary.Reader.call()Ljava/lang/String; (734 bytes) @ 0x0000000115dc4be4 [0x0000000115dc4340+0x00000000000008a4]
J 3870 c1 au.com.realthing.kaldilibrary.Reader.call()Ljava/lang/Object; (5 bytes) @ 0x000000010ee47d44 [0x000000010ee47cc0+0x0000000000000084]
J 3517 c1 java.util.concurrent.FutureTask.run()V java.base@11.0.2 (123 bytes) @ 0x000000010ecc601c [0x000000010ecc5940+0x00000000000006dc]
j  java.util.concurrent.ThreadPoolExecutor.runWorker(Ljava/util/concurrent/ThreadPoolExecutor$Worker;)V+92 java.base@11.0.2
j  java.util.concurrent.ThreadPoolExecutor$Worker.run()V+5 java.base@11.0.2
j  java.lang.Thread.run()V+11 java.base@11.0.2
v  ~StubRoutines::call_stub

siginfo: si_signo: 11 (SIGSEGV), si_code: 1 (SEGV_MAPERR), si_addr: 0x00007fe553d00000