alumae / kaldi-gstreamer-server

Real-time full-duplex speech recognition server, based on the Kaldi toolkit and the GStreamer framwork.
BSD 2-Clause "Simplified" License
1.07k stars 342 forks source link

Error when recording from microphone #182

Open maggieezzat opened 5 years ago

maggieezzat commented 5 years ago

I am trying to record from microphone using this command: arecord -f S16_LE -r 16000 | python kaldigstserver/client.py -r 32000 -

However I get this error on the client output: Recording WAVE 'stdin' : Signed 16 bit Little Endian, Rate 16000 Hz, Mono Received error from server (status 1)

output from server DEBUG 2019-04-04 12:13:58,638 Starting up server INFO 2019-04-04 12:14:12,266 101 GET /worker/ws/speech (127.0.0.1) 0.49ms INFO 2019-04-04 12:14:12,266 New worker available <main.WorkerSocketHandler object at 0x7fb7b10c90d0> INFO 2019-04-04 12:14:31,184 101 GET /client/ws/speech?content-type= (127.0.0.1) 0.39ms INFO 2019-04-04 12:14:31,185 bd36902b-a033-4e0e-8c29-1c7a4b6ec7f9: OPEN INFO 2019-04-04 12:14:31,185 bd36902b-a033-4e0e-8c29-1c7a4b6ec7f9: Request arguments: content-type="" INFO 2019-04-04 12:14:31,185 bd36902b-a033-4e0e-8c29-1c7a4b6ec7f9: Using worker <main.DecoderSocketHandler object at 0x7fb7b10c9a90> INFO 2019-04-04 12:14:46,248 bd36902b-a033-4e0e-8c29-1c7a4b6ec7f9: Sending event {u'status': 1, 'id': 'bd36902b-a033-4e0e-8c29-1c7a4b6ec7f9'} to client INFO 2019-04-04 12:14:46,249 Worker <main.WorkerSocketHandler object at 0x7fb7b10c90d0> leaving INFO 2019-04-04 12:14:46,249 bd36902b-a033-4e0e-8c29-1c7a4b6ec7f9: Handling on_connection_close() INFO 2019-04-04 12:14:46,249 bd36902b-a033-4e0e-8c29-1c7a4b6ec7f9: Closing worker connection INFO 2019-04-04 12:14:47,253 101 GET /worker/ws/speech (127.0.0.1) 0.38ms INFO 2019-04-04 12:14:47,253 New worker available <main.WorkerSocketHandler object at 0x7fb7b10c9510>

output from the worker DEBUG 2019-04-04 12:14:10,074 Starting up worker 2019-04-04 12:14:10 - INFO: decoder2: Creating decoder using conf: {'post-processor': "perl -npe 'BEGIN {use IO::Handle; STDOUT->autoflush(1);} s/(.*)/\1./;'", 'logging': {'version': 1, 'root': {'level': 'DEBUG', 'handlers': ['console']}, 'formatters': {'simpleFormater': {'datefmt': '%Y-%m-%d %H:%M:%S', 'format': '%(asctime)s - %(levelname)7s: %(name)10s: %(message)s'}}, 'disable_existing_loggers': False, 'handlers': {'console': {'formatter': 'simpleFormater', 'class': 'logging.StreamHandler', 'level': 'DEBUG'}}}, 'use-vad': False, 'decoder': {'ivector-extraction-config': 'de_400k_nnet3chain_tdnn1f_2048_sp_bi/ivector_extractor/ivector_extractor.conf', 'lattice-beam': 5.0, 'acoustic-scale': 1.0, 'do-endpointing': True, 'beam': 5.0, 'mfcc-config': 'de_400k_nnet3chain_tdnn1f_2048_sp_bi/conf/mfcc_hires.conf', 'traceback-period-in-secs': 0.25, 'nnet-mode': 3, 'endpoint-silence-phones': '1:2:3:4:5:6', 'word-syms': 'de_400k_nnet3chain_tdnn1f_2048_sp_bi/words.txt', 'num-nbest': 10, 'frame-subsampling-factor': 3, 'phone-syms': 'de_400k_nnet3chain_tdnn1f_2048_sp_bi/phones.txt', 'max-active': 10000, 'fst': 'de_400k_nnet3chain_tdnn1f_2048_sp_bi/HCLG.fst', 'use-threaded-decoder': True, 'model': 'de_400k_nnet3chain_tdnn1f_2048_sp_bi/final.mdl', 'chunk-length-in-secs': 0.25}, 'silence-timeout': 15, 'out-dir': 'tmp', 'use-nnet2': True} 2019-04-04 12:14:10 - INFO: decoder2: Setting decoder property: nnet-mode = 3 2019-04-04 12:14:10 - INFO: decoder2: Setting decoder property: ivector-extraction-config = de_400k_nnet3chain_tdnn1f_2048_sp_bi/ivector_extractor/ivector_extractor.conf 2019-04-04 12:14:10 - INFO: decoder2: Setting decoder property: lattice-beam = 5.0 2019-04-04 12:14:10 - INFO: decoder2: Setting decoder property: acoustic-scale = 1.0 2019-04-04 12:14:10 - INFO: decoder2: Setting decoder property: do-endpointing = True 2019-04-04 12:14:10 - INFO: decoder2: Setting decoder property: beam = 5.0 2019-04-04 12:14:10 - INFO: decoder2: Setting decoder property: mfcc-config = de_400k_nnet3chain_tdnn1f_2048_sp_bi/conf/mfcc_hires.conf 2019-04-04 12:14:10 - INFO: decoder2: Setting decoder property: traceback-period-in-secs = 0.25 2019-04-04 12:14:10 - INFO: decoder2: Setting decoder property: endpoint-silence-phones = 1:2:3:4:5:6 2019-04-04 12:14:10 - INFO: decoder2: Setting decoder property: word-syms = de_400k_nnet3chain_tdnn1f_2048_sp_bi/words.txt 2019-04-04 12:14:11 - INFO: decoder2: Setting decoder property: num-nbest = 10 2019-04-04 12:14:11 - INFO: decoder2: Setting decoder property: frame-subsampling-factor = 3 2019-04-04 12:14:11 - INFO: decoder2: Setting decoder property: phone-syms = de_400k_nnet3chain_tdnn1f_2048_sp_bi/phones.txt 2019-04-04 12:14:11 - INFO: decoder2: Setting decoder property: max-active = 10000 2019-04-04 12:14:11 - INFO: decoder2: Setting decoder property: chunk-length-in-secs = 0.25 2019-04-04 12:14:11 - INFO: decoder2: Setting decoder property: fst = de_400k_nnet3chain_tdnn1f_2048_sp_bi/HCLG.fst 2019-04-04 12:14:11 - INFO: decoder2: Setting decoder property: model = de_400k_nnet3chain_tdnn1f_2048_sp_bi/final.mdl LOG ([5.5.266~1-77ac7]:CompileLooped():nnet-compile-looped.cc:345) Spent 0.0172811 seconds in looped compilation. 2019-04-04 12:14:12 - INFO: decoder2: Created GStreamer elements 2019-04-04 12:14:12 - DEBUG: decoder2: Adding <gi.GstAppSrc object at 0x7f723b1385f0 (GstAppSrc at 0x5614e3fa81d0)> to the pipeline 2019-04-04 12:14:12 - DEBUG: decoder2: Adding <gi.GstDecodeBin object at 0x7f723b1385a0 (GstDecodeBin at 0x5614e3fb20e0)> to the pipeline 2019-04-04 12:14:12 - DEBUG: decoder2: Adding <gi.GstAudioConvert object at 0x7f723b138690 (GstAudioConvert at 0x5614e3fdbb10)> to the pipeline 2019-04-04 12:14:12 - DEBUG: decoder2: Adding <gi.GstAudioResample object at 0x7f723b138550 (GstAudioResample at 0x5614e3fdf8a0)> to the pipeline 2019-04-04 12:14:12 - DEBUG: decoder2: Adding <gi.GstTee object at 0x7f723b138640 (GstTee at 0x5614e3fe2000)> to the pipeline 2019-04-04 12:14:12 - DEBUG: decoder2: Adding <gi.GstQueue object at 0x7f723b138730 (GstQueue at 0x5614e3fe60d0)> to the pipeline 2019-04-04 12:14:12 - DEBUG: decoder2: Adding <gi.GstFileSink object at 0x7f723b138780 (GstFileSink at 0x5614e3fec1e0)> to the pipeline 2019-04-04 12:14:12 - DEBUG: decoder2: Adding <gi.GstQueue object at 0x7f723b1387d0 (GstQueue at 0x5614e3fe63d0)> to the pipeline 2019-04-04 12:14:12 - DEBUG: decoder2: Adding <gi.Gstkaldinnet2onlinedecoder object at 0x7f723b138820 (Gstkaldinnet2onlinedecoder at 0x5614e3fee140)> to the pipeline 2019-04-04 12:14:12 - DEBUG: decoder2: Adding <gi.GstFakeSink object at 0x7f723b138870 (GstFakeSink at 0x5614e4028bc0)> to the pipeline 2019-04-04 12:14:12 - INFO: decoder2: Linking GStreamer elements LOG ([5.5.266~1-77ac7]:ComputeDerivedVars():ivector-extractor.cc:183) Computing derived variables for iVector extractor LOG ([5.5.266~1-77ac7]:ComputeDerivedVars():ivector-extractor.cc:204) Done. 2019-04-04 12:14:12 - INFO: decoder2: Setting pipeline to READY 2019-04-04 12:14:12 - INFO: decoder2: Set pipeline to READY 2019-04-04 12:14:12 - INFO: main: Opening websocket connection to master server 2019-04-04 12:14:12 - INFO: main: Opened websocket connection to server 2019-04-04 12:14:31 - DEBUG: main: : Got message from server of type <class 'ws4py.messaging.TextMessage'> 2019-04-04 12:14:31 - INFO: decoder2: bd36902b-a033-4e0e-8c29-1c7a4b6ec7f9: Initializing request 2019-04-04 12:14:31 - INFO: main: bd36902b-a033-4e0e-8c29-1c7a4b6ec7f9: Started timeout guard 2019-04-04 12:14:31 - INFO: main: bd36902b-a033-4e0e-8c29-1c7a4b6ec7f9: Initialized request 2019-04-04 12:14:31 - DEBUG: main: bd36902b-a033-4e0e-8c29-1c7a4b6ec7f9: Checking that decoder hasn't been silent for more than 15 seconds 2019-04-04 12:14:32 - DEBUG: main: bd36902b-a033-4e0e-8c29-1c7a4b6ec7f9: Checking that decoder hasn't been silent for more than 15 seconds 2019-04-04 12:14:33 - DEBUG: main: bd36902b-a033-4e0e-8c29-1c7a4b6ec7f9: Checking that decoder hasn't been silent for more than 15 seconds 2019-04-04 12:14:34 - DEBUG: main: bd36902b-a033-4e0e-8c29-1c7a4b6ec7f9: Checking that decoder hasn't been silent for more than 15 seconds 2019-04-04 12:14:35 - DEBUG: main: bd36902b-a033-4e0e-8c29-1c7a4b6ec7f9: Checking that decoder hasn't been silent for more than 15 seconds 2019-04-04 12:14:36 - DEBUG: main: bd36902b-a033-4e0e-8c29-1c7a4b6ec7f9: Checking that decoder hasn't been silent for more than 15 seconds 2019-04-04 12:14:37 - DEBUG: main: bd36902b-a033-4e0e-8c29-1c7a4b6ec7f9: Checking that decoder hasn't been silent for more than 15 seconds 2019-04-04 12:14:38 - DEBUG: main: bd36902b-a033-4e0e-8c29-1c7a4b6ec7f9: Checking that decoder hasn't been silent for more than 15 seconds 2019-04-04 12:14:39 - DEBUG: main: bd36902b-a033-4e0e-8c29-1c7a4b6ec7f9: Checking that decoder hasn't been silent for more than 15 seconds 2019-04-04 12:14:40 - DEBUG: main: bd36902b-a033-4e0e-8c29-1c7a4b6ec7f9: Checking that decoder hasn't been silent for more than 15 seconds 2019-04-04 12:14:41 - DEBUG: main: bd36902b-a033-4e0e-8c29-1c7a4b6ec7f9: Checking that decoder hasn't been silent for more than 15 seconds 2019-04-04 12:14:42 - DEBUG: main: bd36902b-a033-4e0e-8c29-1c7a4b6ec7f9: Checking that decoder hasn't been silent for more than 15 seconds 2019-04-04 12:14:43 - DEBUG: main: bd36902b-a033-4e0e-8c29-1c7a4b6ec7f9: Checking that decoder hasn't been silent for more than 15 seconds 2019-04-04 12:14:44 - DEBUG: main: bd36902b-a033-4e0e-8c29-1c7a4b6ec7f9: Checking that decoder hasn't been silent for more than 15 seconds 2019-04-04 12:14:45 - DEBUG: main: bd36902b-a033-4e0e-8c29-1c7a4b6ec7f9: Checking that decoder hasn't been silent for more than 15 seconds 2019-04-04 12:14:46 - WARNING: main: bd36902b-a033-4e0e-8c29-1c7a4b6ec7f9: More than 15 seconds from last decoder hypothesis update, cancelling 2019-04-04 12:14:46 - INFO: decoder2: bd36902b-a033-4e0e-8c29-1c7a4b6ec7f9: Resetting decoder state 2019-04-04 12:14:46 - DEBUG: ws4py: Closing message received (1000) '' 2019-04-04 12:14:46 - DEBUG: main: bd36902b-a033-4e0e-8c29-1c7a4b6ec7f9: Websocket closed() called 2019-04-04 12:14:46 - DEBUG: main: bd36902b-a033-4e0e-8c29-1c7a4b6ec7f9: Websocket closed() finished 2019-04-04 12:14:47 - INFO: main: Opening websocket connection to master server 2019-04-04 12:14:47 - INFO: main: Opened websocket connection to server

alx741 commented 5 years ago

Try running (stop recording with ctrl+c)

arecord -f S16_LE -r 16000 > speech.wav

And then play it back with a player you now it's working, try VLC for instance:

vlc speech.wav
maitrungduc1410 commented 4 years ago

Here I have complete and working code for decoding from microphone, python 2 and 3 are supported. Hope this help someone