coqui-ai / STT

🐸STT - The deep learning toolkit for Speech-to-Text. Training and deploying STT models has never been so easy.
https://coqui.ai
Mozilla Public License 2.0
2.23k stars 267 forks source link

Bug: System.AccessViolationException on IntermediateDecode from .NET #1952

Open Henderz opened 3 years ago

Henderz commented 3 years ago

Describe the bug (As I understand the native_client for .Net is still 'DeepSpeech'; Please correct me if I am wrong.) I am using DeepSpeech for inference from microphone stream captured via CSCore audio module. I have custom code for VAD and get Intermediate decoding done to get sentence wise live transcription.

Models: 9.0.3 Pre-Trained English Audio Model and custom Scorers with the same hyper-parameters as the Pre-Trained Scorer.

This works but at random times I get the following "unhandled" exceptions from the .so file. StackOverflowException from objDeepSpeech.FeedAudioContent(objStream, buffers, Convert.ToUInt32(buffers.Length)); or System.AccessViolationException: 'Attempted to read or write protected memory. This is often an indication that other memory is corrupt.' from objDeepSpeech.IntermediateDecodeWithMetadata(objStream, 1);

The errors originate from within libdeepspeech.so, so I am not able to debug any further. Any help is much appreciated. Thanks.

To Reproduce Steps to reproduce the behavior: 1) Use CSCore to listen to Mic 2) Call objDeepSpeech.IntermediateDecodeWithMetadata(objStream, 1); occasionally - (VAD or just randomly) 3) Between 30 secs to 2 mins, the call to decode fails with AccessViolationException and is not handle-able as its from the unmanaged c[++] compiled code from the .so file.

Expected behavior 1) For the calls to keep working indefinitely as long as the system resources allow it.

Environment (please complete the following information):

Additional context No failure is seen when using NAudio but STT accuracy is a lot worser than when using CSCore for the same audio.

Can NAudio be made better instead for Intermediate Decoding? Is any other language implicitly better at this than C#?

reuben commented 2 years ago

Can you provide a script/repo that can be built/run to reproduce this? Ideally without requiring listening from a live mic. And also the full error stack (including the native frames if that's possible). Thank you!

remyzerems commented 2 years ago

Hi !

I am facing the same issue in a different context. My program is in pure C++, I use PortAudio as audio input provider and I have an access violation exception in libstt.so when calling STT_FeedAudioContent (I'm not calling IntermediateDecode at all). I set up a thread that gets the input data from the microphone and then sending them to STT through STT_FeedAudioContent function. I noticed I could not run STT_FeedAudioContent function in parallel, so I put mutex code to prevent STT_FeedAudioContent from being called multiple times at the same time, but I get the access violation...

The stack trace is : libstt.so!00007ff8b5514a60() libstt.so!00007ff8b55149b2() libstt.so!00007ff8b55149b2() libstt.so!00007ff8b55197ca() libstt.so!00007ff8b551edaa() libstt.so!00007ff8b551eb4b() libstt.so!00007ff8b551fbd5()

I also tested to replace the microphone data with ones (global char buffer full of ones, larger than the size value argument of FeedAudioContent) and it's still showing the error... I send 16000 samples of 16bits if that may help. I guess it may have something to do with threading, thread context and so on... My code is based on client.cc and I tried to call the sample functions with stream_size = 16000 and everything runs fine, that's why I suspect it comes from the threading stuff. But I don't understand why it would not bug on Python and other C# implementations that are also using threads... I don't get it...