alphacep / vosk-api

Offline speech recognition API for Android, iOS, Raspberry Pi and servers with Python, Java, C# and Node
Apache License 2.0
7.92k stars 1.1k forks source link

question/feature_request(python): pause/resume #162

Open Kristinita opened 4 years ago

Kristinita commented 4 years ago

1. Summary

I can't found, how I can pause and resume Vosk.

2. Argumentation

Me and my team uses Speech-To-Text technology for automatically (instead of manual) writing questions from intellectual games like “What? Where? When?”. How does this happen:

  1. We start Speech-To-Text. The game host reads a question, which we write via Speech-To-Text.

  2. When the host finishes reading the question, we stop Speech-To-Text and begin to discuss the question.

    1. We use Speech-To-Text to peek into a question, if we forgot something in it. Using Speech-To-Text for our discussions interferes with this, creates additional difficulties.
  3. When the host begin to read new question, we start Speech-To-Text again and so on.

We need pause/resume Speech-To-Text for this purpose. We don't know how we can do this in Vosk.

3. Examples in another apps

I can get the expected behavior if I use the Web Speech API. I open Web Speech API demo page in Chromium → I press to Microphone button, when I need start/restart or stop Speech-To-Text.

Web Speech API

4. Not helped

  1. I can't found anything about pause/resume in Vosk documentation and vosk_api.h file.
  2. I tried Pause/Break, Ctrl+S, Ctrl+Z F10 keys as described in this and this answers — no effect.

5. Do not offer

Close your console and run python test_microphone.py again

Vosk does not load instantly for use, users must wait. Frequent restart Vosk takes users time.

Using the Web Speech API (see section 3) users don't wait for anything.

6. Data

6.1. Environment

  1. Windows 10.0.18363 Pro N for Workstations 64-bit EN
  2. Python 3.8.3
  3. Vosk 0.3.7

6.2. Script

Slightly modified test_microphone.py:

"""Test Vosk microphone."""
import json
import os
import sys

import pyaudio

from vosk import KaldiRecognizer
from vosk import Model

if not os.path.exists("model"):
    print("Please download the model and unpack as 'model' in the current folder.")
    sys.exit()

MODEL = Model("model")
REC = KaldiRecognizer(MODEL, 16000)

P = pyaudio.PyAudio()
STREAM = P.open(
    format=pyaudio.paInt16,
    channels=1,
    rate=16000,
    input=True,
    frames_per_buffer=8000)
STREAM.start_stream()

while True:
    DATA = STREAM.read(14000)
    if len(DATA) == 0:
        pass
    if REC.AcceptWaveform(DATA):
        KIRA_RESULT = REC.Result()
        KIRA_PARSED_JSON = json.loads(KIRA_RESULT)
        print(KIRA_PARSED_JSON['text'])

Thanks.

nshmyrev commented 4 years ago

You simply stop submitting data to the engine, no?

    if len(DATA) == 0:
        pass
    if paused:
        pass
Kristinita commented 4 years ago

@nshmyrev

Type: Reply 💬

Expected behavior

I press to hotkey for pause:

    no output to console, even if someone is talking into the microphone

I press to another hotkey — hotkey for resume:

    Vosk resumes working as usual.

If I misunderstand you, please clarify your question.

Thanks.

david4ether commented 3 years ago

bind your hotkey to an event and implement the function when that event happens.

charl-em commented 3 years ago

@nshmyrev

Type: Reply 💬

Expected behavior

I press to hotkey for pause:

    no output to console, even if someone is talking into the microphone

I press to another hotkey — hotkey for resume:

    Vosk resumes working as usual.

If I misunderstand you, please clarify your question.

Thanks.

Hi Kristina i'm new to vosk and facing the same issue , what is the hotkey to resume vosk ?

sskorol commented 3 years ago

Assuming using client-server architecture, you may want to solve this on a client-side: just stop sending audio to a server when it's not required. Vosk shouldn't care about your app's logic. It's a low-level API. You either call it to get a transcribe or not. That's pretty much it.

solyarisoftware commented 3 years ago

@Kristinita I agree with @sskorol,

the speech API example you mention, it's a client-side push-to-talk; I'd suggest to do the same as the best way to proceed: manage audio message send logics on the client. This logic could be push to talk or a continuous listening (see my experiment: https://github.com/solyarisoftware/webad) or a wakeword. Anyway, it's better that have to manage events on the client side (IMMO).

You could also do as suggested Nicolay, but why to involve the server for a null request? That's a busy-waiting resources consumption.

I'd suggest to close the issue. Giorgio