alphacep / vosk-server

WebSocket, gRPC and WebRTC speech recognition server based on Vosk and Kaldi libraries
Apache License 2.0
896 stars 243 forks source link

Angular vs Python for a vosk Server. #136

Closed rmmal closed 2 years ago

rmmal commented 3 years ago

Someone created a python VOSK Server.

There were 2 clients, a python one and angular one. When I try the following Python, it returns good results.

          #!/usr/bin/env python3

          import asyncio
          import websockets
          import ssl
          import sys
          from pyaudio import PyAudio, Stream, paInt16
          from contextlib import  contextmanager
          from typing import AsyncGenerator, Generator
          from async_generator import asynccontextmanager
          from async_exit_stack import AsyncExitStack
          from bitstream import BitStream
          ssl_context = ssl.create_default_context()
          ssl_context.check_hostname = False
          ssl_context.verify_mode = ssl.CERT_NONE

          from numpy import *
          @contextmanager
          def _pyaudio() -> Generator[PyAudio, None, None]:
              p = PyAudio()
              try:
                  yield p
              finally:
                  print('Terminating PyAudio object')
                  p.terminate()

          @contextmanager
          def _pyaudio_open_stream(p: PyAudio, *args, **kwargs) -> Generator[Stream, None, None]:
              s = p.open(*args, **kwargs)
              try:
                  yield s
              finally:
                  print('Closing PyAudio Stream')
                  s.close()

          @asynccontextmanager
          async def _polite_websocket(ws: websockets.WebSocketClientProtocol) -> AsyncGenerator[websockets.WebSocketClientProtocol, None]:
              try:
                  yield ws
              finally:
                  print('Terminating connection')
                  await ws.send('{"eof" : 1}')
                  print(await ws.recv())

          async def hello(uri):
              async with AsyncExitStack() as stack:
                  ws = await stack.enter_async_context(websockets.connect(uri))
                  print(f'Connected to {uri}')
                  print('Type Ctrl-C to exit')
                  ws = await stack.enter_async_context(_polite_websocket(ws))
                  p = stack.enter_context(_pyaudio())
                  s = stack.enter_context(_pyaudio_open_stream(p,
                      format = paInt16, 
                      channels = 1,
                      rate = 16000,
                      input = True, 
                      frames_per_buffer = 8000))
                  while True:
                      data = s.read(8000)
                      stream = BitStream(data)

                      if len(data) == 0:
                          break
                      await ws.send(data)
                      print(await ws.recv())

          if len(sys.argv) == 2:
              server = sys.argv[1]
          else:
              server = 'ws://xx.xxx.xxx.xxx:xxxx

          try:
              loop = asyncio.get_event_loop()
              loop.run_until_complete(
                  hello(f'ws://' + server))
          except (Exception, KeyboardInterrupt) as e:
              loop.stop()
              loop.run_until_complete(
                  loop.shutdown_asyncgens())
              if isinstance(e, KeyboardInterrupt):
                  print('Bye')
                  exit(0)
              else:
                  print(f'Oops! {e}')
                  exit(1)

But I need to integrate with a microphone through the web, so I tried your angular client demo. The results were different from the python and not so good.

So I need some help to detect where is the problem. I guess it is a matter of configurations as I found some parameters in the angular that weren't clear in the python script:

  `  readonly SERVER = "ws://xx.xxx.xxx.xxx:xxxx";
    // What should I send the Interval value to have a results like python ?
    readonly INTERVAL = 250;
    // I guess this is the buffer size (and as python sends 8K , I did same here)
      var node = source.context.createScriptProcessor(8192, 1, 1);
  // There is a line in Worker.js that has some values too:
    resampler = new Resampler(sampleRate, 8000, 1, 8 * 1024, false);

So what should I do to have the same results as the python ??

`

nshmyrev commented 3 years ago

Not so good are totally off or similar but just few words differ?

What model do you use on the server?

rmmal commented 3 years ago

The idea is that I cant map the same configuration from python to Angular through the available parameters. Some times the results are totally off. After playing with paramaters they become the same but the python may respond faster. After playing again and again, the results in Angular were not partial but Final which means I wait for a second to get the whole sentence (Not word by word) while in python sometimes I got a sentence sometimes I got few words or one word and so on. but the python is always faster in response.

So I need more understanding to the above parameters and how to make it similar to python.

I am using a model developed by our team.

nshmyrev commented 3 years ago

You can dump angular audio and listen if there are any issues.

rmmal commented 3 years ago

How can I dump them through this Angular code ? Any hints or samples if possible ?