alumae / kaldi-gstreamer-server

Real-time full-duplex speech recognition server, based on the Kaldi toolkit and the GStreamer framwork.
BSD 2-Clause "Simplified" License
1.07k stars 341 forks source link

Gstream-server with NNet3 Decoder #130

Closed aonotas closed 6 years ago

aonotas commented 6 years ago

(updated) Hi! I'm beginner of Kaldi and Gstreamserver. I have a trained Kaldi model with NNet3 on CSJ corpus. I tried to run NNet3 with Gstream server. But I got following errors. Could you give me some advice?

Using steps/online/nnet3/prepare_online_decoding.sh, I convert kaldi model for online decoding.

steps/online/nnet3/prepare_online_decoding.sh data/lang exp/nnet3/extractor exp/nnet3/tdnn1a exp/nnet3/online_output

and create .yml file ($ vim sample_csj_nnet3.yaml)

# You have to download TEDLIUM "online nnet2" models in order to use this sample
# Run download-tedlium-nnet2.sh in 'test/models' to download them.
#use-nnet2: True
#use-nnet2: False
decoder:
    # All the properties nested here correspond to the kaldinnet2onlinedecoder GStreamer plugin properties.
    # Use gst-inspect-1.0 ./libgstkaldionline2.so kaldinnet2onlinedecoder to discover the available properties
    #use-threaded-decoder:  true
    nnet-mode: 3
    model : exp/nnet3/online_output/final.mdl
    word-syms : exp/nnet3/online_output/phones.txt
    # fst :
    mfcc-config : exp/nnet3/online_output/conf/mfcc.conf
    ivector-extraction-config : exp/nnet3/online_output/conf/ivector_extractor.conf
    max-active: 10000
    beam: 10.0
    lattice-beam: 6.0
    acoustic-scale: 0.083
    do-endpointing : true
    endpoint-silence-phones : "1:2:3:4:5:6:7:8:9:10"
    traceback-period-in-secs: 0.25
    chunk-length-in-secs: 0.25
    num-nbest: 10
out-dir: tmp

use-vad: False
silence-timeout: 10

# Just a sample post-processor that appends "." to the hypothesis
post-processor: perl -npe 'BEGIN {use IO::Handle; STDOUT->autoflush(1);} s/(.*)/\1./;'

# A sample full post processor that add a confidence score to 1-best hyp and deletes other n-best hyps
full-post-processor: ./sample_full_post_processor.py

logging:
    version : 1
    disable_existing_loggers: False
    formatters:
        simpleFormater:
            format: '%(asctime)s - %(levelname)7s: %(name)10s: %(message)s'
            datefmt: '%Y-%m-%d %H:%M:%S'
    handlers:
        console:
            class: logging.StreamHandler
            formatter: simpleFormater
            level: DEBUG
    root:
        level: DEBUG
        handlers: [console]

and run gstrem worker like

$ python kaldigstserver/worker.py -u ws://localhost:8888/worker/ws/speech -c sample_csj_nnet3.yaml

Then I got following error messages:

python kaldigstserver/worker.py -u ws://localhost:8888/worker/ws/speech -c sample_csj_nnet3.yaml                                                        [~/asr/kaldi-gstreamer-server-forked]
/home/aaa/asr/kaldi-gstreamer-server-forked/kaldigstserver/decoder.py:11: PyGIDeprecationWarning: Since version 3.11, calling threads_init is no longer needed. See: https://wiki.gnome.org/PyGObject/Threading
  GObject.threads_init()
/home/aaa/asr/kaldi-gstreamer-server-forked/kaldigstserver/decoder2.py:11: PyGIDeprecationWarning: Since version 3.11, calling threads_init is no longer needed. See: https://wiki.gnome.org/PyGObject/Threading
  GObject.threads_init()
   DEBUG 2018-05-15 16:33:33,477 Starting up worker
2018-05-15 16:33:33 -    INFO:   decoder2: Creating decoder using conf: {'post-processor': "perl -npe 'BEGIN {use IO::Handle; STDOUT->autoflush(1);} s/(.*)/\\1./;'", 'logging': {'version': 1, 'root': {'level': 'DEBUG', 'handlers': ['console']}, 'formatters': {'simpleFormater': {'datefmt': '%Y-%m-%d %H:%M:%S', 'format': '%(asctime)s - %(levelname)7s: %(name)10s: %(message)s'}}, 'disable_existing_loggers': False, 'handlers': {'console': {'formatter': 'simpleFormater', 'class': 'logging.StreamHandler', 'level': 'DEBUG'}}}, 'use-nnet2': True, 'full-post-processor': './sample_full_post_processor.py', 'decoder': {'ivector-extraction-config': 'exp/nnet3/online_output/conf/ivector_extractor.conf', 'num-nbest': 10, 'lattice-beam': 6.0, 'phone-syms': 'exp/nnet3/online_output/phones.txt', 'acoustic-scale': 0.083, 'do-endpointing': True, 'beam': 10.0, 'max-active': 10000, 'mfcc-config': 'exp/nnet3/online_output/conf/mfcc.conf', 'traceback-period-in-secs': 0.25, 'nnet-mode': 3, 'model': 'exp/nnet3/online_output/final.mdl', 'endpoint-silence-phones': '1:2:3:4:5:6:7:8:9:10', 'feature-type': 'mfcc', 'chunk-length-in-secs': 0.25}, 'silence-timeout': 10, 'out-dir': 'tmp', 'use-vad': False}
2018-05-15 16:33:33 -    INFO:   decoder2: Setting decoder property: nnet-mode = 3
2018-05-15 16:33:33 -    INFO:   decoder2: Setting decoder property: ivector-extraction-config = exp/nnet3/online_output/conf/ivector_extractor.conf
2018-05-15 16:33:33 -    INFO:   decoder2: Setting decoder property: num-nbest = 10
2018-05-15 16:33:33 -    INFO:   decoder2: Setting decoder property: lattice-beam = 6.0
2018-05-15 16:33:33 -    INFO:   decoder2: Setting decoder property: phone-syms = exp/nnet3/online_output/phones.txt
2018-05-15 16:33:33 -    INFO:   decoder2: Setting decoder property: acoustic-scale = 0.083
2018-05-15 16:33:33 -    INFO:   decoder2: Setting decoder property: do-endpointing = True
2018-05-15 16:33:33 -    INFO:   decoder2: Setting decoder property: beam = 10.0
2018-05-15 16:33:33 -    INFO:   decoder2: Setting decoder property: max-active = 10000
2018-05-15 16:33:33 -    INFO:   decoder2: Setting decoder property: mfcc-config = exp/nnet3/online_output/conf/mfcc.conf
2018-05-15 16:33:33 -    INFO:   decoder2: Setting decoder property: traceback-period-in-secs = 0.25
2018-05-15 16:33:33 -    INFO:   decoder2: Setting decoder property: endpoint-silence-phones = 1:2:3:4:5:6:7:8:9:10
2018-05-15 16:33:33 -    INFO:   decoder2: Setting decoder property: feature-type = mfcc
2018-05-15 16:33:33 -    INFO:   decoder2: Setting decoder property: chunk-length-in-secs = 0.25
2018-05-15 16:33:33 -    INFO:   decoder2: Setting decoder property: model = exp/nnet3/online_output/final.mdl
LOG ([5.4.86~1-9b90c]:CompileLooped():nnet-compile-looped.cc:334) Spent 0.144493 seconds in looped compilation.
2018-05-15 16:33:34 -    INFO:   decoder2: Created GStreamer elements
2018-05-15 16:33:34 -   DEBUG:   decoder2: Adding <__gi__.GstAppSrc object at 0x7fcfcc1a6230 (GstAppSrc at 0x559c1da128c0)> to the pipeline
2018-05-15 16:33:34 -   DEBUG:   decoder2: Adding <__gi__.GstDecodeBin object at 0x7fcfcc1a60f0 (GstDecodeBin at 0x559c1da0e1e0)> to the pipeline
2018-05-15 16:33:34 -   DEBUG:   decoder2: Adding <__gi__.GstAudioConvert object at 0x7fcfcc1a60a0 (GstAudioConvert at 0x559c1da310d0)> to the pipeline
2018-05-15 16:33:34 -   DEBUG:   decoder2: Adding <__gi__.GstAudioResample object at 0x7fcfcc1a61e0 (GstAudioResample at 0x559c1da40360)> to the pipeline
2018-05-15 16:33:34 -   DEBUG:   decoder2: Adding <__gi__.GstTee object at 0x7fcfcc1a6140 (GstTee at 0x559c1da43000)> to the pipeline
2018-05-15 16:33:34 -   DEBUG:   decoder2: Adding <__gi__.GstQueue object at 0x7fcfcc1a6190 (GstQueue at 0x559c1da48170)> to the pipeline
2018-05-15 16:33:34 -   DEBUG:   decoder2: Adding <__gi__.GstFileSink object at 0x7fcfcc1a6280 (GstFileSink at 0x559c1da47200)> to the pipeline
2018-05-15 16:33:34 -   DEBUG:   decoder2: Adding <__gi__.GstQueue object at 0x7fcfcc1a62d0 (GstQueue at 0x559c1da48460)> to the pipeline
2018-05-15 16:33:34 -   DEBUG:   decoder2: Adding <__gi__.Gstkaldinnet2onlinedecoder object at 0x7fcfcc1a6320 (Gstkaldinnet2onlinedecoder at 0x559c1da4e030)> to the pipeline
2018-05-15 16:33:34 -   DEBUG:   decoder2: Adding <__gi__.GstFakeSink object at 0x7fcfcc1a6370 (GstFakeSink at 0x559c1db971e0)> to the pipeline
2018-05-15 16:33:34 -    INFO:   decoder2: Linking GStreamer elements
LOG ([5.4.86~1-9b90c]:ComputeDerivedVars():ivector-extractor.cc:183) Computing derived variables for iVector extractor
LOG ([5.4.86~1-9b90c]:ComputeDerivedVars():ivector-extractor.cc:204) Done.
2018-05-15 16:33:34 -    INFO:   decoder2: Setting pipeline to READY
2018-05-15 16:33:34 -    INFO:   decoder2: Set pipeline to READY
kaldigstserver/worker.py:413: PyGIDeprecationWarning: GObject.MainLoop is deprecated; use GLib.MainLoop instead
  loop = GObject.MainLoop()
2018-05-15 16:33:34 -    INFO:   __main__: Opening websocket connection to master server
2018-05-15 16:33:34 -    INFO:   __main__: Opened websocket connection to server
2018-05-15 16:33:38 -   DEBUG:   __main__: <undefined>: Got message from server of type <class 'ws4py.messaging.TextMessage'>
2018-05-15 16:33:38 -    INFO:   decoder2: 126b1893-1a67-4f52-b392-8c3b05de55a7: Initializing request
2018-05-15 16:33:38 -    INFO:   __main__: 126b1893-1a67-4f52-b392-8c3b05de55a7: Started timeout guard
2018-05-15 16:33:38 -   DEBUG:   __main__: 126b1893-1a67-4f52-b392-8c3b05de55a7: Checking that decoder hasn't been silent for more than 10 seconds
2018-05-15 16:33:38 -    INFO:   __main__: 126b1893-1a67-4f52-b392-8c3b05de55a7: Initialized request
2018-05-15 16:33:38 -   DEBUG:   __main__: 126b1893-1a67-4f52-b392-8c3b05de55a7: Got message from server of type <class 'ws4py.messaging.BinaryMessage'>
2018-05-15 16:33:38 -   DEBUG:   decoder2: 126b1893-1a67-4f52-b392-8c3b05de55a7: Pushing buffer of size 16000 to pipeline
2018-05-15 16:33:38 -   DEBUG:   decoder2: 126b1893-1a67-4f52-b392-8c3b05de55a7: Pushing buffer done
2018-05-15 16:33:38 -    INFO:   decoder2: 126b1893-1a67-4f52-b392-8c3b05de55a7: Connecting audio decoder
2018-05-15 16:33:38 -    INFO:   decoder2: 126b1893-1a67-4f52-b392-8c3b05de55a7: Connected audio decoder
ERROR ([5.4.86~1-9b90c]:OnlineTransform():online-feature.cc:421) Dimension mismatch: source features have dimension 117 and LDA #cols is 361

[ Stack-Trace: ]

kaldi::MessageLogger::HandleMessage(kaldi::LogMessageEnvelope const&, char const*)
kaldi::MessageLogger::~MessageLogger()
kaldi::OnlineTransform::OnlineTransform(kaldi::MatrixBase<float> const&, kaldi::OnlineFeatureInterface*)
kaldi::OnlineIvectorFeature::OnlineIvectorFeature(kaldi::OnlineIvectorExtractionInfo const&, kaldi::OnlineFeatureInterface*)
kaldi::OnlineNnet2FeaturePipeline::OnlineNnet2FeaturePipeline(kaldi::OnlineNnet2FeaturePipelineInfo const&)

clone

terminate called after throwing an instance of 'std::runtime_error'
  what():
zsh: abort (core dumped)  python kaldigstserver/worker.py -u ws://localhost:8888/worker/ws/speech -c
aonotas commented 6 years ago

I updated errors.

I got new errors like:

ERROR ([5.4.86~1-9b90c]:OnlineTransform():online-feature.cc:421) Dimension mismatch: source features have dimension 117 and LDA #cols is 361
alumae commented 6 years ago

Probably the mfcc.conf that you are using is not the same as was used to train the model. Also, are you using pitch features in your model?

aonotas commented 6 years ago

Thank you for your advice! I fixed mfcc config file and it seems to solve this errors! (I did not use pitch feature!) thanks!!

Vishnu-Chittan commented 5 years ago

@alumae
Greetings!!! I am facing some problems while building kaldi with gstreamer. Here I am attaching the worker.log file. Please give some idea to resolve it. worker.log

aonotas commented 5 years ago

@Vishnu-Chittan

ERROR ([5.4.176~1-be967]:ReadConfigFile():parse-options.cc:469) Cannot open config file: opt/models/english/tedlium_nnet_ms_sp_online/conf/mfcc.conf

Please check there is opt/models/english/tedlium_nnet_ms_sp_online/conf/mfcc.conf.

Vishnu-Chittan commented 5 years ago

config for high-resolution MFCC features, intended for neural network training

Note: we keep all cepstra, so it has the same info as filterbank features,

but MFCC is more easily compressible (because less correlated) which is why

we prefer this method.

--use-energy=false # use average of log energy, not energy. --num-mel-bins=40 # similar to Google's setup. --num-ceps=40 # there is no dimensionality reduction. --low-freq=20 # low cutoff frequency for mel bins... this is high-bandwidth data, so

there might be some information at the low end.

--high-freq=-400 # high cutoff frequently, relative to Nyquist of 8000 (=7600)

Vishnu-Chittan commented 5 years ago

@aonotas Thank you so much for your quick reply The above pasted text is in my mfcc.conf file.

Vishnu-Chittan commented 5 years ago

instead of test i am using opt. I am following the below link for reference. https://medium.com/@nikhilamunipalli/simple-guide-to-kaldi-an-efficient-open-source-speech-recognition-tool-for-extreme-beginners-98a48bb34756

00001101-xt commented 4 years ago

Probably the mfcc.conf that you are using is not the same as was used to train the model. Also, are you using pitch features in your model?

Hi, @alumae Is there a way to enable pitch features? I was expecting something like pitch-config=pitch.conf which is not provided yet.

Thanks.