alphacep / vosk-server

WebSocket, gRPC and WebRTC speech recognition server based on Vosk and Kaldi libraries
Apache License 2.0
918 stars 248 forks source link

Replace deprecated onaudioprocess in Angular client demo #126

Open RoboticWater opened 3 years ago

RoboticWater commented 3 years ago

I've been trying to implement a client in Svelte that communicate to asr_server.py example via websocket, and while I can successfully connect with the following code:

const resampler = new Resampler(44100, 16000, 1, 50 * 1024, false); // this is set to 16000 because the model and asr-server are both set that way

const ctx = new AudioContext();
const input = ctx.createMediaStreamSource(stream);
const userSpeechAnalyser = ctx.createAnalyser();
input.connect(userSpeechAnalyser);
const node = input.context.createScriptProcessor(4096, 1, 1);
node.onaudioprocess = (e) => {
    var samples = resampler.resampler(e.inputBuffer.getChannelData(0));
    var dataview = encodeRAW(samples);
    var audioBlob = new Blob([dataview], { type: "audio/raw" });
    socket.send(audioBlob);
};
input.connect(node);
node.connect(input.context.destination);

I'm told that node.onaudioprocess is deprecated. Ideally, this could be updated.

I've looked into how I might do this, but to no avail. First, I tried replacing this code with an AudioWorklet, but AudioWorklets are probably the most convoluted replacements for ScriptProcessorNodes I could possibly imagine, and I haven't been able to get it to work.

Next, I tried using a MediaRecorder, which is able to produce Blobs and send them to the server, but return with empty results. I suspect the issue is that I'm not resampling the audio or something. The code for that is here:

const recorder = new MediaRecorder(stream, {
    audioBitsPerSecond: 16000,
    audioBitrateMode: "constant",
});
recorder.addEventListener("dataavailable", async ({ data }) => {
    socket.send(data);
});

recorder.start(1000);
return recorder;

I'd appreciate if someone could help figure this out. I don't really care how it's implemented, so long as the method isn't deprecated.

nshmyrev commented 3 years ago

Worklets are the proper way. Unfortunately someone has to look on it.

nshmyrev commented 3 years ago

I suspect the issue is that I'm not resampling the audio or something.

Mediarecorder should send opus format, not wav. You need to decode it on the server with ffmpeg probably.

Also, for best responsiveness I'd recommend to look on webrtc example instead.

RoboticWater commented 3 years ago

OK, I got AudioWorklets to work for me; though, it's probably not the prettiest or most efficient method, nor did I implement it in Angular.

Within the App's js (for me, that's App.svelte):

const resampler = new Resampler(44100, 16000, 1, 50 * 1024, false);

const context = new AudioContext();
const input = context.createMediaStreamSource(stream);
await context.audioWorklet.addModule("test.js"); // this will be the path to the audioworklet file in the build directory
const node = new AudioWorkletNode(context, "testworklet");
node.port.onmessage = (event) => {
    var samples = resampler.resampler(event.data.audio);
    var dataview = encodeRAW(samples);
    var audioBlob = new Blob([dataview], { type: "audio/raw" });
    socket.emit("audio", audioBlob);
};
input.connect(node).connect(context.destination);

Within test.js (this file must be placed in the build directory manually, as I'm not referencing it in any other file for a bundler to take care of it):

class TestWorklet extends AudioWorkletProcessor {
    static get parameterDescriptors() {
        return [{
            name: 'test',
        }];
    }

    constructor(options) {
        super(options);
        this.port.onmessage = event => { }

    }

    process(inputs, outputs, parameters) {
        this.port.postMessage({ audio: inputs[0][0] });
        return true;
    }
}

registerProcessor('testworklet', TestWorklet);

The file structure looks roughly like this:

public
└── index.html
└── test.js
└── build
    └── bundle.js
src
└── App.svelte

And here's where I referenced the code: https://stackoverflow.com/a/62732195

RoboticWater commented 3 years ago

I suspect the issue is that I'm not resampling the audio or something.

Mediarecorder should send opus format, not wav. You need to decode it on the server with ffmpeg probably.

Also, for best responsiveness I'd recommend to look on webrtc example instead.

Yeah, I'm making a proof of concept app, so I just need something that works. The app was going to use websockets anyway, so I thought I might as well just use that. I'll look into WebRTC a little later if the lag is too much for testing, but for now, it's bearable.