alumae / kaldi-gstreamer-server

Real-time full-duplex speech recognition server, based on the Kaldi toolkit and the GStreamer framwork.
BSD 2-Clause "Simplified" License
1.07k stars 341 forks source link

Using a pre built nnet3 model #140

Open petgel opened 6 years ago

petgel commented 6 years ago

Hi its one of my first times working with ASR and Kaldi. I got your Server running with the "tedlium_nnet_ms_sp_online" english model. Everything is working fine. (with and without the docker) Then I tried using a pre build german nnet3 model. ("kaldi-generic-de-tdnn_sp" from: "https://github.com/gooofy/zamia-speech#asr-models"). After some small errors i can now start the worker and server. But when i try to use your python client i get an Dimension mismatch Error(see log below). And i basically dont know what to do next to resolve this problem.

libdc1394 error: Failed to initialize libdc1394 libudev: udev_has_devtmpfs: name_to_handle_at on /dev: Operation not permitted DEBUG 2018-07-19 12:38:45,488 Starting up worker INFO 2018-07-19 12:38:45,494 Creating decoder using conf: {'post-processor': "perl -npe 'BEGIN {use IO::Handle; STDOUT->autoflush(1);} s/(.*)/\1./;'", 'use-vad': False, 'decoder': {'ivector-extraction-config': '/opt/models/german/kaldi-generic-de-tdnn_sp-r20180611/ivectors_test_hires/conf/ivector_extractor.conf', 'lattice-beam': 6.0, 'acoustic-scale': 0.083, 'do-endpointing': True, 'beam': 10.0, 'mfcc-config': '/opt/models/german/kaldi-generic-de-tdnn_sp-r20180611/conf/mfcc.conf', 'traceback-period-in-secs': 0.25, 'nnet-mode': 3, 'endpoint-silence-phones': '1:2:3:4:5:6:7:8:9:10', 'word-syms': '/opt/models/german/kaldi-generic-de-tdnn_sp-r20180611/model/graph/words.txt', 'num-nbest': 10, 'max-active': 10000, 'fst': '/opt/models/german/kaldi-generic-de-tdnn_sp-r20180611/model/graph/HCLG.fst', 'use-threaded-decoder': True, 'model': '/opt/models/german/kaldi-generic-de-tdnn_sp-r20180611/model/final.mdl', 'chunk-length-in-secs': 0.25}, 'silence-timeout': 10, 'out-dir': 'tmp', 'use-nnet2': True} INFO 2018-07-19 12:38:45,535 Setting decoder property: nnet-mode = 3 INFO 2018-07-19 12:38:45,536 Setting decoder property: ivector-extraction-config = /opt/models/german/kaldi-generic-de-tdnn_sp-r20180611/ivectors_test_hires/conf/ivector_extractor.conf INFO 2018-07-19 12:38:45,536 Setting decoder property: lattice-beam = 6.0 INFO 2018-07-19 12:38:45,536 Setting decoder property: acoustic-scale = 0.083 INFO 2018-07-19 12:38:45,536 Setting decoder property: do-endpointing = True INFO 2018-07-19 12:38:45,536 Setting decoder property: beam = 10.0 INFO 2018-07-19 12:38:45,536 Setting decoder property: mfcc-config = /opt/models/german/kaldi-generic-de-tdnn_sp-r20180611/conf/mfcc.conf INFO 2018-07-19 12:38:45,536 Setting decoder property: traceback-period-in-secs = 0.25 INFO 2018-07-19 12:38:45,536 Setting decoder property: endpoint-silence-phones = 1:2:3:4:5:6:7:8:9:10 INFO 2018-07-19 12:38:45,536 Setting decoder property: word-syms = /opt/models/german/kaldi-generic-de-tdnn_sp-r20180611/model/graph/words.txt INFO 2018-07-19 12:38:45,975 Setting decoder property: num-nbest = 10 INFO 2018-07-19 12:38:45,975 Setting decoder property: max-active = 10000 INFO 2018-07-19 12:38:45,975 Setting decoder property: chunk-length-in-secs = 0.25 INFO 2018-07-19 12:38:45,975 Setting decoder property: fst = /opt/models/german/kaldi-generic-de-tdnn_sp-r20180611/model/graph/HCLG.fst INFO 2018-07-19 12:38:46,122 Setting decoder property: model = /opt/models/german/kaldi-generic-de-tdnn_sp-r20180611/model/final.mdl LOG ([5.4.176~1-be967]:CompileLooped():nnet-compile-looped.cc:334) Spent 0.00969005 seconds in looped compilation. INFO 2018-07-19 12:38:46,162 Created GStreamer elements DEBUG 2018-07-19 12:38:46,162 Adding <main.GstAppSrc object at 0x7f7389867280 (GstAppSrc at 0x15cf7b0)> to the pipeline DEBUG 2018-07-19 12:38:46,163 Adding <main.GstDecodeBin object at 0x7f7389867320 (GstDecodeBin at 0x1714060)> to the pipeline DEBUG 2018-07-19 12:38:46,163 Adding <main.GstAudioConvert object at 0x7f7389867370 (GstAudioConvert at 0x171e0d0)> to the pipeline DEBUG 2018-07-19 12:38:46,163 Adding <main.GstAudioResample object at 0x7f73898673c0 (GstAudioResample at 0x150cf70)> to the pipeline DEBUG 2018-07-19 12:38:46,163 Adding <main.GstTee object at 0x7f7389867410 (GstTee at 0x172c000)> to the pipeline DEBUG 2018-07-19 12:38:46,163 Adding <main.GstQueue object at 0x7f7389867460 (GstQueue at 0x172e170)> to the pipeline DEBUG 2018-07-19 12:38:46,163 Adding <main.GstFileSink object at 0x7f73898674b0 (GstFileSink at 0x1732800)> to the pipeline DEBUG 2018-07-19 12:38:46,163 Adding <main.GstQueue object at 0x7f7389867500 (GstQueue at 0x172e460)> to the pipeline DEBUG 2018-07-19 12:38:46,163 Adding <main.Gstkaldinnet2onlinedecoder object at 0x7f7389867550 (Gstkaldinnet2onlinedecoder at 0x174c150)> to the pipeline DEBUG 2018-07-19 12:38:46,163 Adding <main.GstFakeSink object at 0x7f73898675a0 (GstFakeSink at 0x1748d30)> to the pipeline INFO 2018-07-19 12:38:46,163 Linking GStreamer elements LOG ([5.4.176~1-be967]:ComputeDerivedVars():ivector-extractor.cc:183) Computing derived variables for iVector extractor LOG ([5.4.176~1-be967]:ComputeDerivedVars():ivector-extractor.cc:204) Done. INFO 2018-07-19 12:38:46,283 Setting pipeline to READY INFO 2018-07-19 12:38:46,284 Set pipeline to READY INFO 2018-07-19 12:38:46,285 Opening websocket connection to master server INFO 2018-07-19 12:38:46,288 Opened websocket connection to server DEBUG 2018-07-19 12:40:14,246 : Got message from server of type <class 'ws4py.messaging.TextMessage'> INFO 2018-07-19 12:40:14,246 7413e34d-c84c-4fdd-8c1b-224ce0889036: Initializing request INFO 2018-07-19 12:40:14,248 7413e34d-c84c-4fdd-8c1b-224ce0889036: Started timeout guard DEBUG 2018-07-19 12:40:14,248 7413e34d-c84c-4fdd-8c1b-224ce0889036: Checking that decoder hasn't been silent for more than 10 seconds INFO 2018-07-19 12:40:14,249 7413e34d-c84c-4fdd-8c1b-224ce0889036: Initialized request DEBUG 2018-07-19 12:40:14,440 7413e34d-c84c-4fdd-8c1b-224ce0889036: Got message from server of type <class 'ws4py.messaging.BinaryMessage'> DEBUG 2018-07-19 12:40:14,441 7413e34d-c84c-4fdd-8c1b-224ce0889036: Pushing buffer of size 4000 to pipeline DEBUG 2018-07-19 12:40:14,443 7413e34d-c84c-4fdd-8c1b-224ce0889036: Pushing buffer done DEBUG 2018-07-19 12:40:14,694 7413e34d-c84c-4fdd-8c1b-224ce0889036: Got message from server of type <class 'ws4py.messaging.BinaryMessage'> DEBUG 2018-07-19 12:40:14,694 7413e34d-c84c-4fdd-8c1b-224ce0889036: Pushing buffer of size 4000 to pipeline DEBUG 2018-07-19 12:40:14,695 7413e34d-c84c-4fdd-8c1b-224ce0889036: Pushing buffer done INFO 2018-07-19 12:40:14,704 7413e34d-c84c-4fdd-8c1b-224ce0889036: Connecting audio decoder INFO 2018-07-19 12:40:14,705 7413e34d-c84c-4fdd-8c1b-224ce0889036: Connected audio decoder ERROR ([5.4.176~1-be967]:OnlineTransform():online-feature.cc:421) Dimension mismatch: source features have dimension 91 and LDA #cols is 280

[ Stack-Trace: ] kaldi::MessageLogger::HandleMessage(kaldi::LogMessageEnvelope const&, char const) kaldi::MessageLogger::~MessageLogger() kaldi::OnlineTransform::OnlineTransform(kaldi::MatrixBase const&, kaldi::OnlineFeatureInterface) kaldi::OnlineIvectorFeature::OnlineIvectorFeature(kaldi::OnlineIvectorExtractionInfo const&, kaldi::OnlineFeatureInterface*) kaldi::OnlineNnet2FeaturePipeline::OnlineNnet2FeaturePipeline(kaldi::OnlineNnet2FeaturePipelineInfo const&)

clone

terminate called after throwing an instance of 'std::runtime_error' what():

alumae commented 6 years ago

Most likely, your MFCC conf file (decoder->mfcc-config in YAML) is not compatible with your acoustic model.

svenha commented 5 years ago

I had the same error message (not in the context of kaldi-gstreamer-server, but for my calls to the kaldi script online2-wav-nnet3-latgen-faster) when using such a model. I had to change the mfcc-config value to .../mfcc_hires.conf and it worked as expected.

BaderEddineB commented 4 years ago

i had the same error with french model, how can we fix it ?

BaderEddineB commented 4 years ago

@svenha i didn't understand what you did !

svenha commented 4 years ago

@BaderEddineB Can you provide details about your model for French?

BaderEddineB commented 4 years ago

I try use this pre-build french nnet3 model (https://github.com/pguyot/zamia-speech/releases/download/20190930/kaldi-generic-fr-tdnn_f-r20191016.tar.xz) with kaldi gstreamer server (kaldinnet2onlinedecoder plugin) i start the worker and server correctly. But when i try to use the python client to decode a audio file i get a Dimension mismatch Error (ERROR ([5.5.732~1-67db3]:OnlineTransform():online-feature.cc:533) Dimension mismatch: source features have dimension 91 and LDA #cols is 280).


this is my yaml file: use-nnet2: True decoder:

All the properties nested here correspond to the kaldinnet2onlinedecoder GStreamer plugin properties.

# Use gst-inspect-1.0 ./libgstkaldionline2.so kaldinnet2onlinedecoder to discover the available properties
nnet-mode : 3
use-threaded-decoder:  true
model : test/models/Zamia-fr/model/final.mdl
#lda-mat: test/models/Zamia-fr/extractor/final.mat
word-syms : test/models/Zamia-fr/model/graph/words.txt
fst : test/models/Zamia-fr/model/graph/HCLG.fst
mfcc-config : test/models/Zamia-fr/conf/mfcc.conf
ivector-extraction-config : test/models/Zamia-fr/ivectors_test_hires/conf/ivector_extractor.conf

max-active: 10000
beam: 10.0
lattice-beam: 6.0
acoustic-scale: 0.083
do-endpointing : true
endpoint-silence-phones : "1:2:3:4:5:6:7:8:9:10"
traceback-period-in-secs: 0.25
chunk-length-in-secs: 0.25
num-nbest: 1

out-dir: tmp

use-vad: False silence-timeout: 10

Just a sample post-processor that appends "." to the hypothesis

post-processor: perl -npe 'BEGIN {use IO::Handle; STDOUT->autoflush(1);} sleep(1); s/(.*)/\1./;'

post-processor: (while read LINE; do echo $LINE; done)

A sample full post processor that add a confidence score to 1-best hyp and deletes other n-best hyps

full-post-processor: ./sample_full_post_processor.py

logging: version : 1 disable_existing_loggers: False formatters: simpleFormater: format: '%(asctime)s - %(levelname)7s: %(name)10s: %(message)s' datefmt: '%Y-%m-%d %H:%M:%S' handlers: console: class: logging.StreamHandler formatter: simpleFormater level: DEBUG root: level: DEBUG handlers: [console]


mfcc.conf:

--use-energy=false --sample-frequency=16000


mfcc_hires.conf:

config for high-resolution MFCC features, intended for neural network training

Note: we keep all cepstra, so it has the same info as filterbank features,

but MFCC is more easily compressible (because less correlated) which is why

we prefer this method.

--use-energy=false # use average of log energy, not energy. --num-mel-bins=40 # similar to Google's setup. --num-ceps=40 # there is no dimensionality reduction. --low-freq=20 # low cutoff frequency for mel bins... this is high-bandwidth data, so

there might be some information at the low end.

--high-freq=-400 # high cutoff frequently, relative to Nyquist of 8000 (=7600)


ivector_extractor.conf:

--cmvn-config=exp/nnet3_chain/ivectors_test_hires/conf/online_cmvn.conf --ivector-period=10 --splice-config=exp/nnet3_chain/ivectors_test_hires/conf/splice.conf --lda-matrix=exp/nnet3_chain/extractor/final.mat --global-cmvn-stats=exp/nnet3_chain/extractor/global_cmvn.stats --diag-ubm=exp/nnet3_chain/extractor/final.dubm --ivector-extractor=exp/nnet3_chain/extractor/final.ie --num-gselect=5 --min-post=0.025 --posterior-scale=0.1 --max-remembered-frames=1000 --max-count=0

svenha commented 4 years ago

The following helped me for the dimension mismatch error: change in the yaml file mfcc.conf to mfcc_hires.conf. But my context was a German model built with zamia-speech.

svenha commented 4 years ago

@BaderEddineB I read in the kaldi group that you found a solution: https://groups.google.com/g/kaldi-help/c/taXB2D1v2D4 . Please report the details of your solution here so that others can learn from it.

BaderEddineB commented 4 years ago

@svenha I'm sorry for this late response, in fact I did the same thing as you, I just changed the content of the mfcc.conf file to the content of the mfcc_hires.conf file, and I changed the directories in ivector_extractor .conf and it worked.