alphacep / vosk-api

Offline speech recognition API for Android, iOS, Raspberry Pi and servers with Python, Java, C# and Node
Apache License 2.0
7.7k stars 1.08k forks source link

VOSK CPU usage #1291

Closed nishanth-cn closed 1 year ago

nishanth-cn commented 1 year ago

Hi.. Thanks for wonderful Voice to Text library.

I am trying to get cpu usage of vosk thread or single vosk recognizer. Vosk recognizer is taking hardly 3-5% CPU per thread in windows, however linux seems to take 10 times. I am just running the vosk java demo code. I tried changing the beam value as mentioned in of the ticket, it did help to reduce by just 5%. I was expecting linux to perform better. is there anything I should tweak to get better performance in linux.

Windows enviroment : i7 4 cores / 1.8-2.3Ghz) Linux virtual enviroment (cloud) : 2 cores/ 2Hhz)

image

nshmyrev commented 1 year ago

Feels like something is different but you didn't provide enough details. What app are you running exactly?

nishanth-cn commented 1 year ago

I am building this code and running the DecoderDemo. (https://github.com/alphacep/vosk-api/tree/master/java)

Windows details show 4 threads running. I just ran them to see whether CPU usage peak.

nshmyrev commented 1 year ago

Vosk definitely has nothing about 4 threads, it should use just 1.

In general this demo is very short, you'd better test on longer and bigger files to see a real picture

nishanth-cn commented 1 year ago

package src.main.java.org.vosk.demo;

import java.io.*;

import javax.sound.sampled.*; import org.json.JSONObject; import org.vosk.LogLevel; import org.vosk.Recognizer; import org.vosk.LibVosk; import org.vosk.Model;

public class LiveCaptionDecoder {

public static void main(String[] argv) throws IOException, UnsupportedAudioFileException {
    LibVosk.setLogLevel(LogLevel.DEBUG);
    AudioFormat format = new AudioFormat(AudioFormat.Encoding.PCM_SIGNED, 60000, 16, 2, 4, 44100, false);
    DataLine.Info info = new DataLine.Info(TargetDataLine.class, format);
    TargetDataLine microphone;
    try (Model model = new Model("demo/src/main/java/org/vosk/demo/model");

        Recognizer recognizer = new Recognizer(model, 120000)) {

        microphone = (TargetDataLine) AudioSystem.getLine(info);
        microphone.open(format);
        microphone.start();

        ByteArrayOutputStream out = new ByteArrayOutputStream();
        int numBytesRead;
        int CHUNK_SIZE = 1024;
        int bytesRead = 0;

        int nbytes;
        byte[] b = new byte[4096];

        recognizer.setPartialWords(true);
        while (bytesRead <= 100000000) {
            numBytesRead = microphone.read(b, 0, CHUNK_SIZE);
            bytesRead += numBytesRead;
            if(numBytesRead>0) {
                if (recognizer.acceptWaveForm(b, numBytesRead)) {
                    System.out.println(recognizer.getResult());
                } else {
                    JSONObject partial = new JSONObject(recognizer.getPartialResult());
                    String pcaption = partial.getString("partial");
                    if (pcaption.length() > 0) {
                        System.out.println("partial: " + pcaption);
                    }
                }
            }
        }
        System.out.println(recognizer.getFinalResult());
        microphone.close();

    } catch (LineUnavailableException e) {
        e.printStackTrace();
    }
}

}

nishanth-cn commented 1 year ago

I am using microphone for realtime convertion. Model : vosk-model-small-en-us-0.15

nishanth-cn commented 1 year ago

Adding more details here. Below is the code I am used to compare cpu usage.

package src.main.java.org.vosk.demo;

import java.io.FileInputStream;
import java.io.BufferedInputStream;
import java.io.IOException;
import java.io.InputStream;

import javax.sound.sampled.AudioSystem;
import javax.sound.sampled.UnsupportedAudioFileException;

import org.json.JSONObject;
import org.vosk.LogLevel;
import org.vosk.Recognizer;
import org.vosk.LibVosk;
import org.vosk.Model;

public class DecoderDemoOLD {

    public static void main(String[] argv) throws IOException, UnsupportedAudioFileException {
        LibVosk.setLogLevel(LogLevel.DEBUG);

        try (Model model = new Model("/local_repo/java/demo/src/main/java/org/vosk/demo/model_useng");
                    InputStream ais = AudioSystem.getAudioInputStream(new BufferedInputStream(new FileInputStream("/local_repo/java/demo/src/main/java/org/vosk/demo/test1.wav")));
                    Recognizer recognizer = new Recognizer(model, 16000)) {

            int nbytes;
            byte[] b = new byte[1024];
            while ((nbytes = ais.read(b)) >= 0) {
                if (recognizer.acceptWaveForm(b, nbytes)) {
                    System.out.println(recognizer.getResult());
                } else {
                   // System.out.println(recognizer.getPartialResult());
                }
            }

            System.out.println(recognizer.getFinalResult());
        }
    }
}

Windows CPU usage is around 20-25% . However linux is hitting 100%. I tried changing heapsize, it didn't help

image

nshmyrev commented 1 year ago
AudioFormat format = new AudioFormat(AudioFormat.Encoding.PCM_SIGNED, 60000, 16, 2, 4, 44100, false);

This is certainly wrong, you can not have stereo input with vosk, you have to convert to mono.

nishanth-cn commented 1 year ago

But working absolutely fine with microphone :) My only concern is CPU usage in linux. Second code I shared is not using microphone. Just using a mono wav speech audio file.

nshmyrev commented 1 year ago

File decoding takes 100% per core which is it supposed to take. It takes 50% CPU since you have 2 cores. If you'd have 8 cores it would be just 12%.

Not sure what Windows CPU usage you report but probably you have 8 core host and 2 core VM which agrees with numbers then.

nishanth-cn commented 1 year ago

So my linux VM with 2 core tops 100% cpu usage, If I assume file decoding is taking 50% then other 50% should be speech to text.