ReadAlongs / SoundSwallower

An even smaller speech recognizer / force aligner
Other
32 stars 4 forks source link

WebRTC VAD code considered harmful (on the browser) #38

Open dhdaines opened 1 year ago

dhdaines commented 1 year ago

Considering that this code already exists somewhere in the guts of the browser, it is pretty silly to compile it separately into WebAssembly. Unfortunately, there isn't actually any API to access it from JavaScript, so we are stuck having to do our own VAD for endpointing.

The problem with the WebRTC code used in PocketSphinx5 is:

For these reasons the ideal solution is, horror of horrors, something very much like the -remove_silence option in PocketSphinx that was the whole reason for creating SoundSwallower in the first place (because I was so seriously annoyed at it removing data from the input, making force-alignment useless). Of course, it has to be done in a way that makes endpointing optional and doesn't break the batch-mode API. So, specifically:

Internally we can either use the WebRTC method based on log-spectra or the PocketSphinx 5prealpha method.