ccoreilly / vosk-browser

A speech recognition library running in the browser thanks to a WebAssembly build of Vosk
Apache License 2.0
382 stars 61 forks source link

Two problems when using vosk-browser with non-streaming, separated static waveforms #69

Open lheine10 opened 1 year ago

lheine10 commented 1 year ago

Hi,

I'm trying to use vosk-browser with several static 2-10sec waveforms.

I'm starting the recognizer with acceptWaveformFloat().

There are two problems:

There is no way to reset the start/end times of the words.

The values get larger with every new (independent) waveform.

The trigger of the onResult handler is unreliable.

It seems to be triggered automatically if the waveform has enough silence at the end.

If that's not the case the onResult handler isn't triggered at all.

I can force an onResult trigger with retrieveFinalResult() but this will cause buggy double trigger of the onResult handler for waveforms with silence at the end.

erikh2000 commented 1 year ago

@lheine10, another workaround - admittedly, not ideal...

You can pass silence samples to acceptWaveformFloat() to induce the final result.

I'm not a maintainer/official project person, so don't interpret the suggestion as "we won't fix it".

erikh2000 commented 1 year ago

Example of forcing a result:

// kaldiSampleRate = whatever kaldiRecognizer was constructed with.
const silenceSamples = createSilenceSamples(kaldiSampleRate, 2000);
kaldiRecognizer.acceptWaveformFloat(silenceSamples, kaldiSampleRate;

Code for createSilenceSamples(): https://github.com/erikh2000/sl-web-audio/blob/main/src/generating/silenceUtil.ts