ccoreilly / vosk-browser

A speech recognition library running in the browser thanks to a WebAssembly build of Vosk
Apache License 2.0
367 stars 60 forks source link

Alternative? #81

Open msqr1 opened 5 months ago

msqr1 commented 5 months ago

This is not really an issue. But I went through remaking the repo from scratch using newer web technology and features: https://github.com/msqr1/Vosklet. Can I merge some changes over there to here, there are lots of stuff to be improve, as this is getting outdated. @ccoreilly

ccoreilly commented 5 months ago

Hi @msqr1 ! Great initiative :) software evolves and needs to be maintained. I do not have time to dedicate to this repository so it is good that better alternatives surge and gain traction.

I'll have a deeper look at your work later this week. In the end, users decide based on the developer experience and the features of these libraries so I'd be interested on what other users like @Yahweasel or @erikh2000 think.

Yahweasel commented 5 months ago

The core thing I need out of vosk-browser is to not have an AudioContext-level API. I do all of my own audio capturing and ten other layers of processing. Further, although in my own project I do use threads, so SharedArrayBuffer is a nonissue, it's valuable to have a version that runs synchronously, because some users (including myself) manage their own threads. I would rather have a vosk running synchronously with a Worker thread I created on my own than running asynchronously with a Worker thread created by a library. To excessively toot my own horn, my own libav.js allows the user to load it in a synchronous mode, a worker mode, or a threaded mode, and provides the same API in all three.

Basically: I wouldn't mind a more up-to-date vosk adapter, but as stands, your API is too opinionated for me.

msqr1 commented 5 months ago

You're right, I try to make this as easy to use as possible, just some minimal setup and you can start recognizing. I agree that more features should be added, but as this is the first version, I want to make it as fast and easy to setup as possible. Other use cases can be addressed later.

erikh2000 commented 5 months ago

@msqr1 I'm interested in your project, but I'm likely to stick with vosk-browser out of inertia and not having any complaints with it. The main thing I saw in Vosklet that I'd like to see in vosk-browser, if practical, is more of the Vosk functions exposed. I had told myself that at some point I'd get vosk-browser building and try to contribute that myself, but I never got around to it.

The faster processing time is intriguing too. What kind of metrics are you seeing?

msqr1 commented 5 months ago

I didn't really measured it, ngl, so maybe I should remove that line. But, I moved hot computations to c++ like free, mapping input data, I also use a simpler mechanism to communicate between js and c++, I used the faster new emscripten wasmfs, I used the new emmalloc, I turned on o3, lto, simd, non trapping float to int and many more... As such, I think it should be faster. You're right, I shouldn't claim anything without benchmarks.

erikh2000 commented 5 months ago

No worries, @msqr1. I don't expect you to be super-scientific in your claims. I was just curious about what kind of speed increase you might be seeing. Your changes for performance seem promising.

Yahweasel commented 5 months ago

FYI, simd will do not a damned thing (other than make it not work on Safari) unless the code is specifically written to use it. wasm simd is broadly compatible with x86 simd, but only the C API, and nobody uses the C API. I would be stunned to learn that that's gaining you anything. I had a simd version of libav.js for years and finally ditched it because it wasn't actually beneficial.

msqr1 commented 5 months ago

Well, the thing is kaldi just refuses to compile with simd off, so I have to turn it on. It may or may not do anything though.

Yahweasel commented 5 months ago

Oh, well that's just lovely X-D

msqr1 commented 5 months ago

Just curious, how do you use a speech recognition library with your libav project? Isn't that for audio formats?

Yahweasel commented 5 months ago

I do not. I use both in Ennuicastr.

msqr1 commented 5 months ago

I can make a sync version, I just don't know how it is possible. If you block the current thread to recognize, how do you stop it? Synchronous model and recognizer loading should be easy. I'm not sure about the recognizer loop.

Yahweasel commented 5 months ago

I can make a sync version, I just don't know how it is possible. If you block the current thread to recognize, how do you stop it? Synchronous model and recognizer loading should be easy. I'm not sure about the recognizer loop.

We're on an issue submitted to a synchronous version of the same API ;)

msqr1 commented 5 months ago

The recognizer, I can't see how it is synchronous? It can't be blocking the one thread that is controlling itself. Can I take a look at the issue? Maybe there is something I can do. Keep in mind that even if the recognizer is asynchronous, you can bind event listener to them, and setXXX on them synchronously. The only synchronous part is the recognition process itself:

Yahweasel commented 5 months ago

The API of Vosk just takes a chunk at a time. That API is synchronous.

msqr1 commented 5 months ago

I get it, but wouldn't that block itself from other actions? I can surely add acceptWaveformSync() that recognize (will block) on the same thread and return the result. Will that fit your use case? Ngl, a fully synchronous API, is even easier than the current one. I only need to translate it over without managing task queues and other stuff

Yahweasel commented 5 months ago

My case is that I have vosk-browser loaded in a Worker thread which is also responsible for echo cancellation, noise suppression, audio metrics, and encoding. Each of these steps takes raw Float32Array audio in and spits raw Float32Array audio out, and I want them all to be synchronous because I'm managing all the threading myself. What I mean when I say that your API is opinionated is that it's doing more than just vosk: it's handling capture, it's handling threading, it's handling formats. For some people, that's presumably very useful. For me, that's actively unhelpful.

Also, to be clear: you should not be writing your code to fit my use case if that doesn't help you in any way. I'm perfectly happy with vosk-browser, and have no urgent need for a more updated version, though as a general principle I'd like for things to be up to date. I'm only presenting my case on this thread because I was asked to.

msqr1 commented 5 months ago

My case is that I have vosk-browser loaded in a Worker thread which is also responsible for echo cancellation, noise suppression, audio metrics, and encoding. Each of these steps takes raw Float32Array audio in and spits raw Float32Array audio out, and I want them all to be synchronous because I'm managing all the threading myself. What I mean when I say that your API is opinionated is that it's doing more than just vosk: it's handling capture, it's handling threading, it's handling formats.

No, I just want to find out how you use it, because I just want to see what use case would synchronous vosk be needed, so thanks for your information! The above really helped me learn!

Yahweasel commented 5 months ago

I can be totally precise: https://github.com/ennuicastr/ennuicastr/blob/3b3830fc979b039c245429a5ec7657594af4a705/awp/ennuicastr-worker.ts#L786

There's my call to acceptWaveformFloat :)

msqr1 commented 5 months ago

I completely understand it now :)))))))

msqr1 commented 5 months ago

@ccoreilly did you go over it?

Utopiah commented 3 months ago

FWIW I'd also be interested in a "updated" alternative that is actively maintained. Yet I would need to better understand in what the alternative is different.

If it is entirely compatible, e.g

even without providing any improvement, I would probably be interested.

Yet, if it does have any trade off, e.g breaks compatibility with some context, like older browsers, Chromium only, etc, then IMHO they should be made explicit.

PS: to clarify even though https://github.com/ccoreilly/vosk-browser/tree/master/examples/modern-vanilla is 2 years old, it works for me even in rather "exotic" context, e.g Oculus browser for WebXR.

msqr1 commented 3 months ago

I have Vosklet that i make as an alternative. You would want to check it out @Utopiah! It does need SABs though. I can make it SAB-less but I think it is just too much work