alphacep / vosk-api

Offline speech recognition API for Android, iOS, Raspberry Pi and servers with Python, Java, C# and Node
Apache License 2.0
7.7k stars 1.08k forks source link

Run Vosk in a separate thread #1285

Open crmuinos opened 1 year ago

crmuinos commented 1 year ago

Excuse me, I'm trying to construct a gui, so Vosk starts when pushing a button Unfortunatly I'm not been able to run the recognition if I embbed all the code within a function, and call the function from a push button

I'm not findind the reason... the models loads, the queue is launched but it seems the data from the microphone is not stored I've tried to addapt " https://github.com/alphacep/vosk-api/blob/master/python/example/test_microphone.py "and I think that the problem comes frome the while True loop, when <> is called, but I'm failing at maiking it work.

Is it possible to find this methods explained to try to go on?

with sd.RawInputStream(samplerate=args.samplerate, blocksize = 8000, device=args.device, dtype="int16", channels=1, callback=callback) // I guess this is the method to store thought a callback the micro sounds.

rec = KaldiRecognizer(model, args.samplerate) // I guess it is the class instance for the recognizer. data = q.get() // I Guess data is stored on the queue rec.AcceptWaveform(data) // dont know What is the difference between rec.PartialResult()) and rec.Result()?

Kind regards

nshmyrev commented 1 year ago

You'd better show your code and ask on your current issue.

crmuinos commented 1 year ago

Thank you, I attach the .py, zipped. Basically I'm defining a thread_function, so it can be called from anywhere, in this case from main but the intention is to have a GUI and launch the recognizer from a button_click event, retrieving the recognized speech to text into a label.

But when the thread starts, recognition is not working. Kind regards.

def thread_function(name):

logging.info("Thread %s: starting", name)
model= Model(model_name="vosk-model-small-es-0.42")
cap=pyaudio.PyAudio()
stream=cap.open(format=pyaudio.paInt16, channels=1, rate=16000, input=True, frames_per_buffer=8192  )
stream.start_stream()
logging.info("Micro is running")
rec = KaldiRecognizer(model, 16000)
rec.SetWords(True)
transcription = [] #I'd want to store here the recognized stream
while True:
    data=stream.read(4096)
    if len(data) == 0:
        break

    if rec.AcceptWaveform(data):
        print(rec.Result()) #for debug
        transcription.append(rec.Result()) #for store
    transcription.append(rec.FinalResult())

return rec.FinalResult()
if __name__ == "__main__": #In a GUI this can be launched from a button_click
    format = "%(asctime)s: %(message)s"
    logging.basicConfig(format=format, level=logging.INFO, datefmt="%H:%M:%S")
    logging.info("Main    : before creating thread")
    x = threading.Thread(target=thread_function, args=(1,),daemon=True)
    logging.info("Main    : before running thread")
    x.start()
    logging.info("Main    : wait for the thread to finish")
    x.join()
    logging.info("Main    : all done")

asr_05.zip

nshmyrev commented 1 year ago

Our demo uses sounddevice, we recommend it instead of pyaudio.

Please also edit your post to format code properly.

crmuinos commented 1 year ago

Thank you, I've updated the previous and posted here the same with soundevice. The problem is the same, I cannot recognize anything. I guess it is related with the thread shared memmory

import subprocess, sys, threading
from vosk import Model, KaldiRecognizer
import logging
import sounddevice as sd 
import queue
import argparse

global q
q = queue.Queue()
model = Model(model_name="vosk-model-small-es-0.42")

def int_or_str(text):
    """Helper function for argument parsing."""
    try:
        return int(text)
    except ValueError:
        return text

def callback(indata, frames, time, status):
    #This is called (from a separate thread) for each audio block.
    if status:
        print(status, file=sys.stderr)
    q.put(bytes(indata))

parser = argparse.ArgumentParser(add_help=False)
parser.add_argument(
    "-l", "--list-devices", action="store_true",
    help="show list of audio devices and exit")

args, remaining = parser.parse_known_args()

if args.list_devices:
    print(sd.query_devices())
    parser.exit(0)

parser = argparse.ArgumentParser(
    description=__doc__,
    formatter_class=argparse.RawDescriptionHelpFormatter,
    parents=[parser])

parser.add_argument(
    "-d", "--device", type=int_or_str,
    help="input device (numeric ID or substring)")

parser.add_argument(
    "-r", "--samplerate", type=int, help="sampling rate")

parser.add_argument(
    "-m", "--model", type=str, help="language model; e.g. en-us, fr, nl; default is en-us")

args = parser.parse_args(remaining)

if args.samplerate is None:
    device_info = sd.query_devices(args.device, "input")
    # soundfile expects an int, sounddevice provides a float:
    args.samplerate = int(device_info["default_samplerate"])

Here I attach the def called by the thread manager

def thread_function(name):
    try:
        with sd.RawInputStream(samplerate=args.samplerate, blocksize = 8000, device=args.device, dtype="int16", channels=1, callback=callback):
            print("#" * 80)
            print("Press Ctrl+C to stop the recording")
            print("#" * 80)
            rec = KaldiRecognizer(model, args.samplerate)

            while True:
                data = q.get()
                if rec.AcceptWaveform(data):
                    #print(rec.Result())
                    diccionario = rec.Result()
                    print(diccionario)

    except KeyboardInterrupt:
        print("\nDone")

        parser.exit(0)
    except Exception as e:

        parser.exit(type(e).__name__ + ": " + str(e))

And here I put the main

if __name__ == "__main__": #In a GUI this can be launched from a button_click
    format = "%(asctime)s: %(message)s"
    logging.basicConfig(format=format, level=logging.INFO, datefmt="%H:%M:%S")
    logging.info("Main    : before creating thread")
    x = threading.Thread(target=thread_function, args=(1,),daemon=True)
    logging.info("Main    : before running thread")
    x.start()
    logging.info("Main    : wait for the thread to finish")
    x.join()
    logging.info("Main    : all done")

But I'm getting the following output from log and Recognitor

2:36:23: Main : before creating thread 12:36:23: Main : before running thread 12:36:23: Main : wait for the thread to finish ################################################################################ Press Ctrl+C to stop the recording ################################################################################ { "text" : "" } { "text" : "" } { "text" : "" }

crmuinos commented 1 year ago

The above code replicates the micro example with sound device, even I've kept the argparse to isolate changes to detect any possible error. -It seems that the thread is created without problems

Any help is welcomed and appreciated @nshmyrev. I've uploaded the .py zipped. In the previous post I missed it.

asr_06.zip

nshmyrev commented 1 year ago

Probably your input device is wrong and system records silence. You need to check if original test_microphone.py works for you.

crmuinos commented 1 year ago

Thank you for your reply @nshmyrev. test_microphone.py works fine, I've already use it months ago in a common pc and on a Jetson Xavier NX both with SD and with pyaudio. For a quick test I prefer Pyaudio to avoid the callback . Troubles ariesed when I tried to launch the recognition from a buton in a GUI, so I tried the code I copied before. Would you mind to test the .py file to check if you get the same results?

kind regards and thanks in advance

nshmyrev commented 1 year ago

Your code asr06.py works fine here, I suppose you have the issue with the microphone device:

[shmyrev@Beta example]$ python3 asr_06.py 
LOG (VoskAPI:ReadDataFiles():model.cc:213) Decoding params beam=11 max-active=4000 lattice-beam=4
LOG (VoskAPI:ReadDataFiles():model.cc:216) Silence phones 1:2:3:4:5:6:7:8:9:10
LOG (VoskAPI:RemoveOrphanNodes():nnet-nnet.cc:948) Removed 0 orphan nodes.
LOG (VoskAPI:RemoveOrphanComponents():nnet-nnet.cc:847) Removing 0 orphan components.
LOG (VoskAPI:ReadDataFiles():model.cc:248) Loading i-vector extractor from /Users/shmyrev/.cache/vosk/vosk-model-small-es-0.42/ivector/final.ie
LOG (VoskAPI:ComputeDerivedVars():ivector-extractor.cc:183) Computing derived variables for iVector extractor
LOG (VoskAPI:ComputeDerivedVars():ivector-extractor.cc:204) Done.
LOG (VoskAPI:ReadDataFiles():model.cc:282) Loading HCL and G from /Users/shmyrev/.cache/vosk/vosk-model-small-es-0.42/graph/HCLr.fst /Users/shmyrev/.cache/vosk/vosk-model-small-es-0.42/graph/Gr.fst
LOG (VoskAPI:ReadDataFiles():model.cc:303) Loading winfo /Users/shmyrev/.cache/vosk/vosk-model-small-es-0.42/graph/phones/word_boundary.int
18:24:04: Main    : before creating thread
18:24:04: Main    : before running thread
18:24:04: Main    : wait for the thread to finish
################################################################################
Press Ctrl+C to stop the recording
################################################################################
{
  "text" : "muchacha"
}
^CTraceback (most recent call last):
  File "/Users/shmyrev/Documents/IOS/vosk-api-osx/python/example/asr_06.py", line 102, in <module>
    x.join()
  File "/opt/local/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/threading.py", line 1060, in join
    self._wait_for_tstate_lock()
  File "/opt/local/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/threading.py", line 1080, in _wait_for_tstate_lock
    if lock.acquire(block, timeout):
KeyboardInterrupt
nshmyrev commented 1 year ago

You need to dump audio to a file and listen (test_microphone has -f option)