alphacep / vosk-api

Offline speech recognition API for Android, iOS, Raspberry Pi and servers with Python, Java, C# and Node
Apache License 2.0
7.84k stars 1.09k forks source link

Recognize single word in sentence #345

Closed stanislas-brossette closed 3 years ago

stanislas-brossette commented 3 years ago

Hello, thank you for your work, this api is very impressive and convenient. I am trying to use it on personal projects for home appliances, but I am running into an issue. I would like to recognize quickly a single keyword with high accuracy in any sentence. For example while someone is speaking, if they say the keyword at any given time, it triggers an action, without listening to the rest of what is being said. So far, I have only been able to recognize full sentences and single words if they are in-between long silences. I am basing my work on the python example test_microphone.py Would you please provide me any hints that would help me do that? I think I should make use of the PartialResult json struct coming out of AcceptWaveform, but it does not provide accuracy rating for the words. Looking forward to reading from you. Best regards. Stanislas

nshmyrev commented 3 years ago

Hello Stanislas

Thank you for trying vosk. Unfortunately we do not implement keyword spotting yet but keyword spotting is our highest priority. Please subscribe to the issue https://github.com/alphacep/vosk-api/issues/107 to get updates about the status.

In general, the most accurate spotting is the one that recognizes speech. You can simply run speech recognizer and check the words in the output.

nshmyrev commented 3 years ago

Duplicate of https://github.com/alphacep/vosk-api/issues/107

stanislas-brossette commented 3 years ago

Ok, I understand, thank you for your reply. I will try to do a workaround. @nshmyrev Could you please explain to me what are the various numbers used 16000, 8000, 4000? It seems that the 16000 can't be modified because they correspond to the rate of speech on which the model was trained. What are 8000 and 4000 and what would be the effect of modifying them?

rec = KaldiRecognizer(model, 16000)

p = pyaudio.PyAudio()
stream = p.open(format=pyaudio.paInt16, channels=1, rate=16000, input=True, frames_per_buffer=8000)
stream.start_stream()

while True:
    data = stream.read(4000)
    if len(data) == 0:
        break
    if rec.AcceptWaveform(data):
        print(rec.Result())
    else:
        print(rec.PartialResult())

print(rec.FinalResult())
Gtxplosiom commented 7 months ago

Hello, thank you for your work, this api is very impressive and convenient. I am trying to use it on personal projects for home appliances, but I am running into an issue. I would like to recognize quickly a single keyword with high accuracy in any sentence. For example while someone is speaking, if they say the keyword at any given time, it triggers an action, without listening to the rest of what is being said. So far, I have only been able to recognize full sentences and single words if they are in-between long silences. I am basing my work on the python example test_microphone.py Would you please provide me any hints that would help me do that? I think I should make use of the PartialResult json struct coming out of AcceptWaveform, but it does not provide accuracy rating for the words. Looking forward to reading from you. Best regards. Stanislas

what if you used the "in" in conditional statements of python? example, if "Open" in sentence: do something. Sorry for my english btw