alphacep / vosk-server

WebSocket, gRPC and WebRTC speech recognition server based on Vosk and Kaldi libraries
Apache License 2.0
871 stars 241 forks source link

How to improve realtime convertion in vosk server ? #205

Open nishanth-cn opened 1 year ago

nishanth-cn commented 1 year ago

I have dropped my client sample code. I am trying launch 200 threads which connect to vosk server and convert the sample audio file. It takes lot of time to convert. Also change buffersize from 8k to 16k adds more delay. Is there any way to improve performance ?

Timings observed For 1 thread : 4-6s For 2-4 thread : 6s For 10 threads : 12s For 20 threads : 22s For 200 threads : 3m 45s

Our plan is to use vosk server in our product server which has to handle upto 700 streams at a time. If a single vosk server cannot handle so many request, we can deploy multiple server. But I want to know about vosk server capabilties.

no of cores : 2

import com.neovisionaries.ws.client.*;
import java.io.*;
import java.nio.file.*;
import java.util.List;
import java.util.Map;
import java.util.concurrent.*;
import java.util.ArrayList;
import org.json.JSONObject;

public class VoskClient extends Thread {

    private ArrayList<String> results = new ArrayList<String>();
    private CountDownLatch recieveLatch;
    private WebSocketFactory factory;
    private WebSocket ws;

    public VoskClient() {
        try {
            //setName("Live Caption Thread Ch: " + originalName + "_" + owner);
            factory = new WebSocketFactory();
            ws = factory.createSocket("ws://**"myVM'sIP address**":2700");
            ws.addListener(new WebSocketAdapter() {
                @Override
                public void onTextMessage(WebSocket websocket, String message) {
                    //results.add(message);
                    if (message.contains("text")) {
                        JSONObject captionMap = new JSONObject(message);
                        String caption = captionMap.get("text").toString();
                        System.out.println(getName() + " " + caption);
                    }
                    recieveLatch.countDown();
                }
                @Override
                public void onConnected(WebSocket websocket, Map<String, List<String>> headers) throws Exception
                {
                    websocket.getConnectedSocket().setKeepAlive(true);
                }
            });
            ws.connect();
            start();

        } catch (Exception e) {
            e.printStackTrace();
        }

    }

    public void run() {
        try {
            recieveLatch = new CountDownLatch(1);
            ws.sendText("{\"config\" : {\"sample_rate\" : " + 16000 + " }}");
            recieveLatch.countDown();

            FileInputStream fis = new FileInputStream(new File("C:\\dev\\Websocket\\src\\main\\java\\test16k.wav"));
            DataInputStream dis = new DataInputStream(fis);
            byte[] buf = new byte[8000];
            while (true) {
                int nbytes = dis.read(buf);
                if (nbytes < 0) break;
                recieveLatch = new CountDownLatch(1);
                ws.sendBinary(buf);
                recieveLatch.await();
            }
            recieveLatch = new CountDownLatch(1);
            ws.sendText("{\"eof\" : 1}");
            recieveLatch.await();
            ws.disconnect();
        } catch (Exception e) {
            e.printStackTrace();
        }
        //return results;
    }

    public static void main(String[] args) throws Exception {
        VoskClient client[] = new VoskClient[500];

        for (int i = 0; i < client.length; i++) {
            client[i] = new VoskClient();
        }
    }
}
nshmyrev commented 1 year ago

It depends a lot on server hardware