alumae / kaldi-gstreamer-server

Real-time full-duplex speech recognition server, based on the Kaldi toolkit and the GStreamer framwork.
BSD 2-Clause "Simplified" License
1.07k stars 339 forks source link

No speech detected #165

Open ghost opened 5 years ago

ghost commented 5 years ago

I have 1 master and 1 worker up and running, I have checked to make sure worker is up properly. I have also configured a Node.js client using https://github.com/Kaljurand/dictate.js. The client server is very simple and I only changed the path that points to the worker.

However I am unable to send data from the client to the worker. This is my worker log. The app log does not offer anything for me to debug, and I can't get much information out of the worker log either.

Worker log:

DEBUG 2018-12-17 02:00:06,694 Starting up worker
2018-12-17 02:00:06 -    INFO:   decoder2: Creating decoder using conf: {'post-processor': "perl -npe 'BEGIN {use IO::Handle; STDOUT->autoflush(1);} s/(.*)/\\1./;'", 'logging': {'version': 1, 'root': {'level': 'DEBUG', 'handlers': ['console']}, 'formatters': {'simpleFormater': {'datefmt': '%Y-%m-%d %H:%M:%S', 'format': '%(asctime)s - %(levelname)7s: %(name)10s: %(message)s'}}, 'disable_existing_loggers': False, 'handlers': {'console': {'formatter': 'simpleFormater', 'class': 'logging.StreamHandler', 'level': 'DEBUG'}}}, 'use-vad': False, 'decoder': {'ivector-extraction-config': '/opt/models/english/tedlium_nnet_ms_sp_online/conf/ivector_extractor.conf', 'num-nbest': 10, 'lattice-beam': 6.0, 'acoustic-scale': 0.083, 'do-endpointing': True, 'beam': 10.0, 'max-active': 10000, 'fst': '/opt/models/english/tedlium_nnet_ms_sp_online/HCLG.fst', 'mfcc-config': '/opt/models/english/tedlium_nnet_ms_sp_online/conf/mfcc.conf', 'use-threaded-decoder': True, 'traceback-period-in-secs': 0.25, 'model': '/opt/models/english/tedlium_nnet_ms_sp_online/final.mdl', 'word-syms': '/opt/models/english/tedlium_nnet_ms_sp_online/words.txt', 'endpoint-silence-phones': '1:2:3:4:5:6:7:8:9:10', 'chunk-length-in-secs': 0.25}, 'silence-timeout': 10, 'out-dir': 'tmp', 'use-nnet2': True}
2018-12-17 02:00:06 -    INFO:   decoder2: Setting decoder property: ivector-extraction-config = /opt/models/english/tedlium_nnet_ms_sp_online/conf/ivector_extractor.conf
2018-12-17 02:00:06 -    INFO:   decoder2: Setting decoder property: num-nbest = 10
2018-12-17 02:00:06 -    INFO:   decoder2: Setting decoder property: lattice-beam = 6.0
2018-12-17 02:00:06 -    INFO:   decoder2: Setting decoder property: acoustic-scale = 0.083
2018-12-17 02:00:06 -    INFO:   decoder2: Setting decoder property: do-endpointing = True
2018-12-17 02:00:06 -    INFO:   decoder2: Setting decoder property: beam = 10.0
2018-12-17 02:00:06 -    INFO:   decoder2: Setting decoder property: max-active = 10000
2018-12-17 02:00:06 -    INFO:   decoder2: Setting decoder property: mfcc-config = /opt/models/english/tedlium_nnet_ms_sp_online/conf/mfcc.conf
2018-12-17 02:00:06 -    INFO:   decoder2: Setting decoder property: traceback-period-in-secs = 0.25
2018-12-17 02:00:06 -    INFO:   decoder2: Setting decoder property: word-syms = /opt/models/english/tedlium_nnet_ms_sp_online/words.txt
2018-12-17 02:00:07 -    INFO:   decoder2: Setting decoder property: endpoint-silence-phones = 1:2:3:4:5:6:7:8:9:10
2018-12-17 02:00:07 -    INFO:   decoder2: Setting decoder property: chunk-length-in-secs = 0.25
2018-12-17 02:00:07 -    INFO:   decoder2: Setting decoder property: fst = /opt/models/english/tedlium_nnet_ms_sp_online/HCLG.fst
2018-12-17 02:00:15 -    INFO:   decoder2: Setting decoder property: model = /opt/models/english/tedlium_nnet_ms_sp_online/final.mdl
2018-12-17 02:00:15 -    INFO:   decoder2: Created GStreamer elements
2018-12-17 02:00:15 -   DEBUG:   decoder2: Adding <__main__.GstAppSrc object at 0x7f516814b370 (GstAppSrc at 0x14faf30)> to the pipeline
2018-12-17 02:00:15 -   DEBUG:   decoder2: Adding <__main__.GstDecodeBin object at 0x7f516814b320 (GstDecodeBin at 0x1502090)> to the pipeline
2018-12-17 02:00:15 -   DEBUG:   decoder2: Adding <__main__.GstAudioConvert object at 0x7f516814b410 (GstAudioConvert at 0x1524690)> to the pipeline
2018-12-17 02:00:15 -   DEBUG:   decoder2: Adding <__main__.GstAudioResample object at 0x7f516814b2d0 (GstAudioResample at 0x13e1bc0)> to the pipeline
2018-12-17 02:00:15 -   DEBUG:   decoder2: Adding <__main__.GstTee object at 0x7f516814b3c0 (GstTee at 0x1535000)> to the pipeline
2018-12-17 02:00:15 -   DEBUG:   decoder2: Adding <__main__.GstQueue object at 0x7f516814b4b0 (GstQueue at 0x1538210)> to the pipeline
2018-12-17 02:00:15 -   DEBUG:   decoder2: Adding <__main__.GstFileSink object at 0x7f516814b500 (GstFileSink at 0x153d200)> to the pipeline
2018-12-17 02:00:15 -   DEBUG:   decoder2: Adding <__main__.GstQueue object at 0x7f516814b550 (GstQueue at 0x1538500)> to the pipeline
2018-12-17 02:00:15 -   DEBUG:   decoder2: Adding <__main__.Gstkaldinnet2onlinedecoder object at 0x7f516814b5a0 (Gstkaldinnet2onlinedecoder at 0x155e0a0)> to the pipeline
2018-12-17 02:00:15 -   DEBUG:   decoder2: Adding <__main__.GstFakeSink object at 0x7f516814b5f0 (GstFakeSink at 0x143a600)> to the pipeline
2018-12-17 02:00:15 -    INFO:   decoder2: Linking GStreamer elements
LOG ([5.4.176~1-be967]:ComputeDerivedVars():ivector-extractor.cc:183) Computing derived variables for iVector extractor
LOG ([5.4.176~1-be967]:ComputeDerivedVars():ivector-extractor.cc:204) Done.
2018-12-17 02:00:15 -    INFO:   decoder2: Setting pipeline to READY
2018-12-17 02:00:15 -    INFO:   decoder2: Set pipeline to READY
2018-12-17 02:00:15 -    INFO:   __main__: Opening websocket connection to master server
2018-12-17 02:00:15 -    INFO:   __main__: Opened websocket connection to server
2018-12-17 02:00:46 -   DEBUG:   __main__: <undefined>: Got message from server of type <class 'ws4py.messaging.TextMessage'>
2018-12-17 02:00:46 -    INFO:   decoder2: 186a155f-59c5-490d-a069-cb4481ddf4c4: Initializing request
2018-12-17 02:00:46 -    INFO:   decoder2: 186a155f-59c5-490d-a069-cb4481ddf4c4: Setting caps to audio/x-raw, layout=(string)interleaved, rate=(int)16000, format=(string)S16LE, channels=(int)1
2018-12-17 02:00:46 -    INFO:   decoder2: 186a155f-59c5-490d-a069-cb4481ddf4c4: Connecting audio decoder
2018-12-17 02:00:46 -   DEBUG:   __main__: 186a155f-59c5-490d-a069-cb4481ddf4c4: Checking that decoder hasn't been silent for more than 10 seconds
2018-12-17 02:00:46 -    INFO:   __main__: 186a155f-59c5-490d-a069-cb4481ddf4c4: Started timeout guard
2018-12-17 02:00:46 -    INFO:   decoder2: 186a155f-59c5-490d-a069-cb4481ddf4c4: Connected audio decoder
2018-12-17 02:00:46 -    INFO:   __main__: 186a155f-59c5-490d-a069-cb4481ddf4c4: Initialized request
2018-12-17 02:00:47 -   DEBUG:   __main__: 186a155f-59c5-490d-a069-cb4481ddf4c4: Checking that decoder hasn't been silent for more than 10 seconds
2018-12-17 02:00:48 -   DEBUG:   __main__: 186a155f-59c5-490d-a069-cb4481ddf4c4: Checking that decoder hasn't been silent for more than 10 seconds
2018-12-17 02:00:49 -   DEBUG:   __main__: 186a155f-59c5-490d-a069-cb4481ddf4c4: Checking that decoder hasn't been silent for more than 10 seconds
2018-12-17 02:00:50 -   DEBUG:   __main__: 186a155f-59c5-490d-a069-cb4481ddf4c4: Checking that decoder hasn't been silent for more than 10 seconds
2018-12-17 02:00:51 -   DEBUG:   __main__: 186a155f-59c5-490d-a069-cb4481ddf4c4: Checking that decoder hasn't been silent for more than 10 seconds
2018-12-17 02:00:52 -   DEBUG:   __main__: 186a155f-59c5-490d-a069-cb4481ddf4c4: Checking that decoder hasn't been silent for more than 10 seconds
2018-12-17 02:00:53 -   DEBUG:   __main__: 186a155f-59c5-490d-a069-cb4481ddf4c4: Checking that decoder hasn't been silent for more than 10 seconds
2018-12-17 02:00:54 -   DEBUG:   __main__: 186a155f-59c5-490d-a069-cb4481ddf4c4: Checking that decoder hasn't been silent for more than 10 seconds
2018-12-17 02:00:55 -   DEBUG:   __main__: 186a155f-59c5-490d-a069-cb4481ddf4c4: Checking that decoder hasn't been silent for more than 10 seconds
2018-12-17 02:00:56 - WARNING:   __main__: 186a155f-59c5-490d-a069-cb4481ddf4c4: More than 10 seconds from last decoder hypothesis update, cancelling
2018-12-17 02:00:56 -    INFO:   decoder2: 186a155f-59c5-490d-a069-cb4481ddf4c4: Resetting decoder state
2018-12-17 02:00:56 -   DEBUG:      ws4py: Closing message received (1000) ''
2018-12-17 02:00:56 -   DEBUG:   __main__: 186a155f-59c5-490d-a069-cb4481ddf4c4: Websocket closed() called
2018-12-17 02:00:56 -   DEBUG:   __main__: 186a155f-59c5-490d-a069-cb4481ddf4c4: Websocket closed() finished
2018-12-17 02:00:57 -    INFO:   __main__: Opening websocket connection to master server
2018-12-17 02:00:57 -    INFO:   __main__: Opened websocket connection to server
alumae commented 5 years ago

The server stores all audio that it receives to the directory configured with the out-dir property in the YAML file. Check that there is actually speech in the audio. Most probably you are just sending it silence, due to some client-side audio configuration issue.

ghost commented 5 years ago

Thanks for the insight. To start up my worker, I had used https://github.com/jcsilva/docker-kaldi-gstreamer-server. The only out-dir config is the one in sample_english_nnet2.yaml, and it's pointed to tmp in the Docker container. Inside tmp, the only content is pip_build_root and no audio files.

At the same time, I know my client side audio configuration is working because your mob demo works fine. Therefore the issue must be the connection between my Node server and the Kaldi server.

On the Node.js side, I changed the following in dictate.js

var SERVER = "ws://localhost:8080/client/ws/speech";
var SERVER_STATUS = "ws://localhost:8080/client/ws/status";
var REFERENCE_HANDLER = "ws://localhost:8080/client/dynamic/reference";

and in my home page added

option(value='ws://localhost:8080/client/ws/speech|ws://localhost:8080/client/ws/status', selected='selected') localhost

I use localhost because Kaldi is sitting in a container on my local machine. Not sure why this won't work?