ccoreilly / vosk-browser

A speech recognition library running in the browser thanks to a WebAssembly build of Vosk
Apache License 2.0
364 stars 60 forks source link

Recognizer listens before the event 'result' or 'partialresult' is added #46

Closed timpuida closed 2 years ago

timpuida commented 2 years ago

Hello! If I say "Hello" and then run the code below, I get the result "Hello".

this.recognizer.addEventListener('partialresult', this.getPartialResult);
this.recognizer.addEventListener('result', this.getResult);

Expected: recognizer starts listening when got event listener.

I am creating a feature in which users press and speak.

I thought to write when users don't need a microphone like this, but then the recognizer pauses.

this.mediaStream.getAudioTracks (). forEach (track => {
   track.enabled = false;
});

So if after a long time user will press my button again, code will run track.enabled = true and recognizer will continue to recognize previous (not actual) voice.

Tested on Vue.js

ccoreilly commented 2 years ago

You should not send audio data if you don't want the recognizer to recognize something. That is how the react demo works.

Event listeners of the library are not the place to have the business logic of your app.

timpuida commented 2 years ago

Yeah, I saw a demo example. But the demo has the same problem: After pressing “load” the API can record even before unmuting the microphone. But if the user is saying something after unmuting the microphone and clicking “mute” straight after he said something, so the recognizer will produce the recording only after unmuting the microphone again and the recording will contain the previous record.

Steps to reproduce: 1 case: load => saying something => click to unmute => recognizer understood and wrote it (this is not good) 2 case: the microphone is unmuted => say something and urgent mute the microphone (bu sure recognizer didn't write the text) => wait a long time => unmute the microphone and be silent => recognizer writes from the previous record.

ccoreilly commented 2 years ago

case 1 is surprising but I guess state is not well handled and somehow audio is not sent to the bucket

case 2 is expected, audio is sent to the recognizer. Recognition results of already sent audio should be sent back even if audio is muted.

These are edge cases that can be solved in the app. I still don't think the library should handle those.

Maybe you can share some code of your application, I struggle to understand why this cannot be handled in it.

timpuida commented 2 years ago

You are right! That should be handled in an app. I use code from examples/modern-vanilla. To resolve case 2 I changed line in recognizer-processor.js const data = inputs[0][0]; to const data = inputs[0][0] || new Float32Array(128); So, now the recognizer will send response back immediately even if audio is muted