alphacep / vosk-server

WebSocket, gRPC and WebRTC speech recognition server based on Vosk and Kaldi libraries
Apache License 2.0
900 stars 244 forks source link

Recording Audio in the Browser Using the MediaStream #102

Open zhurlik opened 3 years ago

zhurlik commented 3 years ago

Hi,

I'm trying to capture and to record a voice in the browser. Looks like modern browsers (at least Firefox and Chrome) don't support recording audio/wav. Is it possible to configure vosk-server to be able to handle MIME types: audio/ogg or audio/webm too?

Thanks, Vlad

nshmyrev commented 3 years ago

Looks like modern browsers (at least Firefox and Chrome) don't support recording audio/wav.

They do, you can find the demo here:

https://github.com/alphacep/vosk-server/tree/master/client-samples/angular-demo

nshmyrev commented 3 years ago

For compressed audio transfer you can check webrtc example:

https://github.com/alphacep/vosk-server/tree/master/webrtc

zhurlik commented 3 years ago

Thank you for your responses. I will check the angular-demo, as I remember I saw an open issue. Regarding to webRTC I am going to look at this, do you have a docker file for running RTC vosk-server?

In the meantime maybe you could find the problem however that's a simple JavaScript that doesn't work in the Chrome:

const webSocket = new WebSocket('ws://localhost:2700');

webSocket.onopen = (event) => {
    console.log('>> Connected');
    navigator.mediaDevices.getUserMedia({audio: {sampleRate: 8000}, video: false})
        .then((stream) => {
            const recordedChunks = [];
            const mediaRecorder = new MediaRecorder(stream);

            mediaRecorder.addEventListener('dataavailable', (e) => {
                if (e.data.size > 0) {
                    recordedChunks.push(e.data);
                }
            });

            mediaRecorder.addEventListener('stop', () => {
                console.log('>> Sending to server...')
                webSocket.send(new Blob(recordedChunks));
                webSocket.send(JSON.stringify({eof: 1}));
            });

            console.log('>> Start recording...')
            mediaRecorder.start();

            setTimeout(() => {
                console.log('>> Stop recording...');
                mediaRecorder.stop();
            }, 3000);
        })
        .catch((err) => {
            /* handle the error */
            console.error(err);
        });
}

webSocket.onmessage = (event) => {
    console.log(">> Response:");
    console.log(event.data);
}

webSocket.onclose = (event) => {
    // This can happen if the blob was too big
    // E.g. "Frame size of 65580 bytes exceeds maximum accepted frame size"
    console.log(">> OnClose:" + JSON.stringify(event));
}

webSocket.onerror = (event) => {
    console.log(">> Error:" + JSON.stringify(event));
}
nshmyrev commented 3 years ago

In the meantime maybe you could find the problem however that's a simple JavaScript that doesn't work in the Chrome:

As far as I know you can not run websocket connection from chrome without ssl certificate and you are trying ws.

https://stackoverflow.com/questions/50704614/websocket-connection-fails-on-chrome-without-ssl

zhurlik commented 3 years ago

Hi,

I don't think so that the problem is with sending, in my opinion the problem is in Chrome/Firefox they use the audio format that the vosk-server can not parse/read. I am attaching vosk-server-test.zip with the scripts and the readme file where I have described the scenarios that I did. Could you have a look at them maybe it will help you to understand what I mean. You can see that the wav files were recorded in the browsers don't work, it doesn't matter how I will send them to the vosk-server either via javascript in the browser or via the command line using python script.

Thanks, Vlad

nshmyrev commented 3 years ago

The audio you recorded with addpipe is 44100 hz sample rate, server by default expects 8000 hz sample rate. To reconfigure the server you can send config message like this:

    await websocket.send('''{"config" : { "sample_rate" : 16000.0}}''')

as in test_words.py example.

In general it is pretty wasteful to send 44khz audio over network, you'd better resample to 8000hz on client, you can use resampler javascript library for that as in angular js demo:

https://github.com/alphacep/vosk-server/blob/1461abc592855dceb436ca8538d7766b0a877216/client-samples/angular-demo/src/assets/recorder-worker.js#L3