alphacep / vosk-server

WebSocket, gRPC and WebRTC speech recognition server based on Vosk and Kaldi libraries
Apache License 2.0
896 stars 243 forks source link

ReactJS + WebSocket + Vosk #257

Open fkurushin opened 1 month ago

fkurushin commented 1 month ago

I writes react client to recognise speech through web sockets. I am happily connected to the server (alphacep/kaldi-ru:latest), send requests there, everything alright, but my responses is empty. Can anyone please take a look at my code?

import React, { useRef, useEffect, useState } from 'react';

const URL = 'ws://localhost:2700';
const sampleRate = 16000;
const blockSize = 4000;
const dt = blockSize / sampleRate * 1000;

const SpeechRecoPage = () => {
  const [isRecording, setIsRecording] = useState(false);
  const [mediaRecorder, setMediaRecorder] = useState(null);
  const socket = useRef(null);

  useEffect(() => {
    if (!socket.current) {
      socket.current = new WebSocket(URL);
    };

    socket.current.onopen = function () {
      socket.current.send(JSON.stringify({ config: { sample_rate: sampleRate } }));
      console.log('Connection established');
    };

    socket.current.onmessage = function (event) {
      console.log(`Received message: ${event.data}`);
    };

    socket.current.onclose = function (event) {
      console.log(`Connection closed: ${event.code} ${event.reason}`);
    };

    socket.current.onerror = function (event) {
      console.error('WebSocket error observed:', event);
    };

    setIsRecording(true);
    navigator.mediaDevices
      .getUserMedia({
        video: false,
        audio: {
            echoCancellation: true,
            noiseSuppression: true,
            channelCount: 1,
            sampleRate: sampleRate,

        },
    }).then((stream) => {
        const mediaRecorder = new MediaRecorder(stream);
        mediaRecorder.start(dt)
        setMediaRecorder(mediaRecorder);
        mediaRecorder.addEventListener('dataavailable', (event) => {
          if (socket.current.readyState === WebSocket.OPEN) {
            console.log(event.data)
            socket.current.send(event.data);
          }
        });
        // mediaRecorder.start();
      })
      .catch((error) => {
        console.log(`Error accessing microphone: ${error.message}`);
      });

  }, []);

  const closeConnection = () => {
    setIsRecording(false);
    mediaRecorder.stop();
    if (socket.current.readyState === WebSocket.OPEN) {
      socket.current.send('{"eof" : 1}')
    }
  };

  return (
    <div>
      <h1>RECORDING</h1>
      {isRecording ? (
        <button onClick={closeConnection}>Stop Recording</button>
      ) : (
        <button onClick={setIsRecording(true)}>Start Recording</button>
      )}
    </div>
  );
};

export default SpeechRecoPage;

This is client logs:

Screenshot 2024-07-24 at 10 01 30

Any help welcome, thank you!

nshmyrev commented 1 month ago

You try to send mp4 to websocket while it expects raw wav. You need to modify server to handle mp4.

We use AudioContext worklet to record audio instead, you can find example here:

https://github.com/alphacep/vosk-server/tree/master/client-samples/javascript

Overall, we recommend webrtc server for web, not websockets, it is more appropriate approach:

https://github.com/alphacep/vosk-server/tree/master/webrtc

fkurushin commented 1 month ago

Hi @nshmyrev thank you for the answer. I thought that if server passed the process_chunk without errors it accepts my request.

def process_chunk(rec, message):
    if message == '{"eof" : 1}':
        return rec.FinalResult(), True
    if message == '{"reset" : 1}':
        return rec.FinalResult(), False
    elif rec.AcceptWaveform(message):
        return rec.Result(), False
    else:
        return rec.PartialResult(), False

Overall, I will try to convert to audio/wav, than audio context, and webrtc.