Open j1elo opened 1 year ago
You can probably add log messages on recognizer free to see if it is really released before async ends. In that case one need to check ffi bindings why they don't keep the reference. Or, maybe, we have to keep the reference ourselves.
While the Vosk Model is reference-counted (vosk_model_free
just calls Unref
, source), the Recognizer doesn't seem to be reference-counted (vosk_recognizer_free
immediately deletes the object, source).
I've added 4 Warning messages to the code at vosk-api/src/recognizer.cc
:
AcceptWaveform(Vector<BaseFloat> &wdata)
~Recognizer()
DEBUG='ffi:_ForeignFunction'
to get debug logs from ffi-napi.The result is this on a good run:
ffi:_ForeignFunction invoking async proxy function +523ms
WARNING (VoskAPI:AcceptWaveform():recognizer.cc:355) TRACE -- AcceptWaveform BEGIN
WARNING (VoskAPI:AcceptWaveform():recognizer.cc:381) TRACE -- AcceptWaveform END (false)
ffi:_ForeignFunction invoking async proxy function +415ms
WARNING (VoskAPI:AcceptWaveform():recognizer.cc:355) TRACE -- AcceptWaveform BEGIN
WARNING (VoskAPI:AcceptWaveform():recognizer.cc:381) TRACE -- AcceptWaveform END (false)
ffi:_ForeignFunction invoking async proxy function +524ms
WARNING (VoskAPI:AcceptWaveform():recognizer.cc:355) TRACE -- AcceptWaveform BEGIN
WARNING (VoskAPI:AcceptWaveform():recognizer.cc:377) TRACE -- AcceptWaveform END (true)
ffi:_ForeignFunction invoking proxy function +189ms
WARNING (VoskAPI:~Recognizer():recognizer.cc:101) TRACE -- ~Recognizer() destructor BEGIN
WARNING (VoskAPI:~Recognizer():recognizer.cc:121) TRACE -- ~Recognizer() destructor END
And this on a bad one:
ffi:_ForeignFunction invoking async proxy function +678ms
WARNING (VoskAPI:AcceptWaveform():recognizer.cc:355) TRACE -- AcceptWaveform BEGIN
WARNING (VoskAPI:AcceptWaveform():recognizer.cc:381) TRACE -- AcceptWaveform END (false)
ffi:_ForeignFunction invoking async proxy function +645ms
WARNING (VoskAPI:AcceptWaveform():recognizer.cc:355) TRACE -- AcceptWaveform BEGIN
ffi:_ForeignFunction invoking proxy function +248ms
WARNING (VoskAPI:~Recognizer():recognizer.cc:101) TRACE -- ~Recognizer() destructor BEGIN
WARNING (VoskAPI:~HashList():util/hash-list-inl.h:117) Possible memory leak: 4836 != 5120: you might have forgotten to call Delete on some Elems
=================================================================
==376978==ERROR: AddressSanitizer: heap-use-after-free on address 0x619000985e28 at pc 0x7fe0e312a7a2 bp 0x7fe0e04a3630 sp 0x7fe0e04a3620
READ of size 1 at 0x619000985e28 thread T8
WARNING (VoskAPI:~Recognizer():recognizer.cc:121) TRACE -- ~Recognizer() destructor END
#0 0x7fe0e312a7a1 in kaldi::nnet3::NnetComputer::Run() (/app/node_modules/vosk/lib/linux-x86_64/libvosk.so+0xfda7a1)
#1 0x7fe0e31f5de9 in kaldi::nnet3::DecodableNnetLoopedOnlineBase::AdvanceChunk() (/app/node_modules/vosk/lib/linux-x86_64/libvosk.so+0x10a5de9)
#2 0x7fe0e31f68fa in kaldi::nnet3::DecodableAmNnetLoopedOnline::LogLikelihood(int, int) (/app/node_modules/vosk/lib/linux-x86_64/libvosk.so+0x10a68fa)
#3 0x7fe0e2b923c6 in kaldi::LatticeIncrementalDecoderTpl<fst::Fst<fst::ArcTpl<fst::TropicalWeightTpl<float> > >, kaldi::decoder::BackpointerToken>::ProcessEmitting(kaldi::DecodableInterface*) (/app/node_modules/vosk/lib/linux-x86_64/libvosk.so+0xa423c6)
#4 0x7fe0e2bd3e0b in kaldi::LatticeIncrementalDecoderTpl<fst::Fst<fst::ArcTpl<fst::TropicalWeightTpl<float> > >, kaldi::decoder::BackpointerToken>::AdvanceDecoding(kaldi::DecodableInterface*, int) (/app/node_modules/vosk/lib/linux-x86_64/libvosk.so+0xa83e0b)
#5 0x7fe0e250a2fd in Recognizer::AcceptWaveform(kaldi::Vector<float>&) (/app/node_modules/vosk/lib/linux-x86_64/libvosk.so+0x3ba2fd)
#6 0x7fe0e250a897 in Recognizer::AcceptWaveform(char const*, int) (/app/node_modules/vosk/lib/linux-x86_64/libvosk.so+0x3ba897)
#7 0x7fe0e291fedc in vosk_recognizer_accept_waveform (/app/node_modules/vosk/lib/linux-x86_64/libvosk.so+0x7cfedc)
#8 0x7fe0e4c35704 in ffi_call_unix64 (/app/node_modules/ffi-napi/build/Release/ffi_bindings.node+0x15704)
#9 0x7fe0e4c3425b in ffi_call_int (/app/node_modules/ffi-napi/build/Release/ffi_bindings.node+0x1425b)
#10 0x7fe0e4c294a2 in FFI::FFI::AsyncFFICall(uv_work_s*) (/app/node_modules/ffi-napi/build/Release/ffi_bindings.node+0x94a2)
#11 0x1547a93 in worker ../deps/uv/src/threadpool.c:122
#12 0x7fe0f8f57608 in start_thread /build/glibc-SzIz7B/glibc-2.31/nptl/pthread_create.c:477
#13 0x7fe0f8e7a132 in __clone (/lib/x86_64-linux-gnu/libc.so.6+0x11f132)
0x619000985e28 is located 168 bytes inside of 1056-byte region [0x619000985d80,0x6190009861a0)
freed by thread T0 here:
#0 0x7fe0f943551f in operator delete(void*) ../../../../src/libsanitizer/asan/asan_new_delete.cc:165
#1 0x7fe0e25097e1 in Recognizer::~Recognizer() (/app/node_modules/vosk/lib/linux-x86_64/libvosk.so+0x3b97e1)
#2 0x7fe0e291ff8a in vosk_recognizer_free (/app/node_modules/vosk/lib/linux-x86_64/libvosk.so+0x7cff8a)
#3 0x7fe0e4c35704 in ffi_call_unix64 (/app/node_modules/ffi-napi/build/Release/ffi_bindings.node+0x15704)
#4 0x602000088d6f (<unknown module>)
previously allocated by thread T0 here:
#0 0x7fe0f9434587 in operator new(unsigned long) ../../../../src/libsanitizer/asan/asan_new_delete.cc:104
#1 0x7fe0e252ebc8 in Recognizer::Recognizer(Model*, float) (/app/node_modules/vosk/lib/linux-x86_64/libvosk.so+0x3debc8)
#2 0x7fe0e291fd14 in vosk_recognizer_new (/app/node_modules/vosk/lib/linux-x86_64/libvosk.so+0x7cfd14)
#3 0x7fe0e4c35704 in ffi_call_unix64 (/app/node_modules/ffi-napi/build/Release/ffi_bindings.node+0x15704)
#4 0x6020000f3b8f (<unknown module>)
Which I think confirms that the issue is a destructor running while the AcceptWaveform was still running on its own thread.
In that case one need to check ffi bindings why they don't keep the reference.
I think the ffi bindings are in fact taking care to keep alive the memory of values passed to ffi functions. These lines seem to be doing the trick. According to the ref module docs, writePointer
prevents GC from releasing the memory.
However, that would only cover the case of Node.js clearing the memory with its GC. It will never prevent the explicit destructor that is being called by Recognizer.free()
.
Or, maybe, we have to keep the reference ourselves.
Yes, it seems to me that this is needed. Otherwise, some other way to make synchronous methods to play well with async ones.
Adding a bit more info to this issue...
I have been learning the hard way that the acceptWaveformAsync()
method is a very dangerous beast, and calling free()
in the middle of its processing is not the only issue with it. It cannot be simply used like typical Node.js single-threaded, asynchronous code style.
For example, this code will crash very confusingly, due to ASSERTION_FAILED
messages from Vosk. audioStream
is a Node.js Readable Stream:
audioStream.on("readable", async () => {
const chunk = audioStream.read();
console.log("#### 1");
const ready = await recognizer.acceptWaveformAsync(chunk);
console.log("#### 2");
// ... Check if ready and call result() or partialResult() ...
});
I'm attaching a sample file that can be used to reproduce this scenario:
index.js.txt (rename to .js
)
Such code will possibly run acceptWaveformAsync()
multiple times, concurrently, if the event is emitted very rapidly. Unsuspecting users will write this code any day, as it is typical in Node.js to write event-based code.
If the output happens to be like this, then all is good:
#### 1
#### 2
#### 1
#### 2
#### 1
#### 2
But any system hiccup that breaks the timings and causes a couple events to occur, will break:
#### 1
#### 2
#### 1
#### 2
#### 1
#### 1
ASSERTION_FAILED [...]
Aborted (core dumped)
The assertion message varies, these are the ones I've seen:
ASSERTION_FAILED (VoskAPI:ExtractWindow():feature-window.cc:182) Assertion failed: (start_sample >= sample_offset && end_sample <= num_samples)
ASSERTION_FAILED (VoskAPI:UpdateStatsUntilFrameWeighted():online-ivector-feature.cc:285) Assertion failed: (frame >= 0 && frame < this->NumFramesReady() && delta_weightsprovided && ! updated_with_no_deltaweights && frame <= most_recent_frame_withweight)
WARNING (VoskAPI:BestPathEnd():lattice-incremental-online-decoder.cc:99) No final token found. ASSERTION_FAILED (VoskAPI:TraceBackBestPath():lattice-incremental-online-decoder.cc:110) Assertion failed: (!iter.Done() && oarc != NULL)
This is just my suggestion, but I feel that if acceptWaveformAsync()
cannot run twice concurrently, the Promise from the second call should probably just wait until the first run is finished.
Another solution is to say that Vosk is not responsible for that, and I think it would be a valid stance (not ideal, but I guess it's valid), but then docs need to include very shiny extra documentation to warn users about the dangers of this method, explain the technicalities of the problem and instruct application writers to implement their own locking mechanisms. Otherwise, other people like me will waste countless hours scratching their heads looking at assertion crashes and wondering why Vosk fails so much :-)
I'm facing a server crash which ultimately happens because a Recognizer is free'd while still being in use by a call to
acceptWaveformAsync()
.The issue happens as follows: My Node.js server processes several transcription requests from remote clients and each one is handled with a Recognizer instance. Each chunk of audio input is then being transcribed with calls to
Recognizer.acceptWaveformAsync()
. When the remote client shuts its connection down (e.g. the remote app is closed), the server callsRecognizer.free()
.Judging from the AddressSanitizer stack traces, it seems to me that if an
acceptWaveformAsync()
calls was working whenfree()
is called, this can trigger an "AddressSanitizer: heap-use-after-free" and a crash. But that's just my feeling, when I look at the stack traces. It might be something completely different that I'm not realizing.A couple of different instances of this issue:
Crash 1
Crash 2