alphacep / vosk-api

Offline speech recognition API for Android, iOS, Raspberry Pi and servers with Python, Java, C# and Node
Apache License 2.0
8.16k stars 1.12k forks source link

Issue on transmitting audio through network #341

Closed candaj closed 3 years ago

candaj commented 3 years ago

Hello,

I'm hoping you can help me. I'm using a modified version of the test_microphone.py to listen sound on one side, transmitting it through mqtt to a server and decoding the frame with vosk.

The client code looks like this:

from vosk import Model, KaldiRecognizer
import os
import json
import paho.mqtt.publish as publish
import paho.mqtt.client as mqtt
import pyaudio
import time
import pyttsx3

def on_connect(client, userdata, flags, rc):
     print("Connected flags"+str(flags)+"result code "\
     +str(rc)+"client1_id ")
     client.connected_flag=True

def on_message(client, userdata, message):
    print("message received " ,str(message.payload.decode("utf-8")))
    print("message topic=",message.topic)
    print("message qos=",message.qos)
    print("message retain flag=",message.retain)
    flag = False

model = Model("model")
rec = KaldiRecognizer(model, 16000)

cli = mqtt.Client("client_tester")
cli.on_message = on_message
cli.connect("192.168.1.84")
cli.subscribe("JARVIS/CLIENTS/JARVIS_CLIENT_BEDROOM/OUT")
cli.loop_start()
p = pyaudio.PyAudio()
stream = p.open(format=pyaudio.paInt16, channels=1, rate=16000, input=True, frames_per_buffer=8000)
stream.start_stream()
while True:
    data = stream.read(4000)
    print("sending: "+text)
    print(data)
    print(type(base64.encodebytes(data)))
    cli.publish("TEST/IN",data)
cli.loop_stop()

and the server part looks like this:

from vosk import Model, KaldiRecognizer
import os
import json
import paho.mqtt.publish as publish
import paho.mqtt.client as mqtt
import pyaudio
import time
import pyttsx3
import base64   

model = Model("model")
rec = KaldiRecognizer(model, 16000)

def on_message(client, userdata, message):
    print(message.payload)
    print(type(message.payload))
    if rec.AcceptWaveform(message.payload):
        text =json.loads(rec.Result())['text']
        print("test="+text)
        print("message topic=",message.topic)
        print("message qos=",message.qos)
        print("message retain flag=",message.retain)
        flag = False
        engine = pyttsx3.init()
        voices = engine.getProperty('voices')
        print(len(voices))
        engine.setProperty('voice', voices[0].id)
        engine.setProperty('rate',180)
        engine.say()
        engine.runAndWait()

def on_connect(client, userdata, flags, rc):
     print("Connected flags"+str(flags)+"result code "\
     +str(rc)+"client1_id ")
     client.connected_flag=True

text ="test"
cli = mqtt.Client()
cli.on_connect = on_connect
cli.on_message = on_message
cli.connect("192.168.1.84")
cli.subscribe("TEST/IN")
cli.loop_forever()

on both side, the bytes are the same but the method AcceptWaveform does not return true. If both code are merged (basically test_microphone.py without mqtt) it's working perfectly. Any idea ?

nshmyrev commented 3 years ago

Have you seen our mqtt server?

https://github.com/alphacep/vosk-server/tree/master/mqtt

candaj commented 3 years ago

Yes I've tested it as it is in the description (downloaded ru model etc) and the AcceptWaveform is failing as well

nshmyrev commented 3 years ago

Yes I've tested it as it is in the description (downloaded ru model etc) and the AcceptWaveform is failing as well

How is it failing exactly?

sskorol commented 3 years ago

@candaj any reason why you specify frames_per_buffer=8000 and then read/send 4k chunks? Also, wondering why are you loading Vosk on the client-side?

candaj commented 3 years ago

Yes I've tested it as it is in the description (downloaded ru model etc) and the AcceptWaveform is failing as well

How is it failing exactly?

It returning false as in my example.

@candaj any reason why you specify frames_per_buffer=8000 and then read/send 4k chunks? Also, wondering why are you loading Vosk on the client-side?

It the same configuration as the test_microphone.py I don't know. Yeah I should have removed it from the paste, it was to verify the data is well translated anyway on the client side.

Edit: sorry miss clicked on close issue, I've reopen it

sskorol commented 3 years ago

@candaj take a look at similar code in a server-side repo: https://github.com/alphacep/vosk-server/blob/master/websocket/test_microphone.py It uses the same values and adapted for streaming. Those sample you tried is intended to be used in a standalone mode, not in a client-server. Btw, have you tried to save your recording to a file? Just curious if there's a valid voice data at all and you are not streaming a silence / damaged chunks due to e.g. mic issues.

candaj commented 3 years ago

@sskorol I'll test it this afternoon. But I'm not sure it's the issue as the base test from mqtt server is failing too in my case (btw, i'm using vosk 0.3.7, paho-mqtt 1.5.1 and pyaudio 0.2.11 if it can help)

sskorol commented 3 years ago

@candaj recent vosk version is 0.3.15. Try to update it as well.

candaj commented 3 years ago

Is the version 0.3.15 exists on windows ? I'm developping on windows right now and can't find the wheel ? If not, can I compile it myself and how ?

sskorol commented 3 years ago

I don't believe it exists for Win yet. Personally, I don't use Windows. You can try. But it might be painful, as you need to build both Kaldi and Vosk. A simple compilation instruction is in docs. But I'd check existing Dockerfiles and try to adapt them for your platform. Usually, the difference is in flags and a couple of dependencies. W/o Docker it might be much harder to set everything up on Windows.

candaj commented 3 years ago

Hi,

Tested on V0.3.15 on unix, working well for simple wav file, not with mic yet (others issues). Will continue to work on it. Thank you guys !