petgel commented 6 years ago

Hi its one of my first times working with ASR and Kaldi. I got your Server running with the "tedlium_nnet_ms_sp_online" english model. Everything is working fine. (with and without the docker) Then I tried using a pre build german nnet3 model. ("kaldi-generic-de-tdnn_sp" from: "https://github.com/gooofy/zamia-speech#asr-models"). After some small errors i can now start the worker and server. But when i try to use your python client i get an Dimension mismatch Error(see log below). And i basically dont know what to do next to resolve this problem.

libdc1394 error: Failed to initialize libdc1394 libudev: udev_has_devtmpfs: name_to_handle_at on /dev: Operation not permitted DEBUG 2018-07-19 12:38:45,488 Starting up worker INFO 2018-07-19 12:38:45,494 Creating decoder using conf: {'post-processor': "perl -npe 'BEGIN {use IO::Handle; STDOUT->autoflush(1);} s/(.*)/\1./;'", 'use-vad': False, 'decoder': {'ivector-extraction-config': '/opt/models/german/kaldi-generic-de-tdnn_sp-r20180611/ivectors_test_hires/conf/ivector_extractor.conf', 'lattice-beam': 6.0, 'acoustic-scale': 0.083, 'do-endpointing': True, 'beam': 10.0, 'mfcc-config': '/opt/models/german/kaldi-generic-de-tdnn_sp-r20180611/conf/mfcc.conf', 'traceback-period-in-secs': 0.25, 'nnet-mode': 3, 'endpoint-silence-phones': '1:2:3:4:5:6:7:8:9:10', 'word-syms': '/opt/models/german/kaldi-generic-de-tdnn_sp-r20180611/model/graph/words.txt', 'num-nbest': 10, 'max-active': 10000, 'fst': '/opt/models/german/kaldi-generic-de-tdnn_sp-r20180611/model/graph/HCLG.fst', 'use-threaded-decoder': True, 'model': '/opt/models/german/kaldi-generic-de-tdnn_sp-r20180611/model/final.mdl', 'chunk-length-in-secs': 0.25}, 'silence-timeout': 10, 'out-dir': 'tmp', 'use-nnet2': True} INFO 2018-07-19 12:38:45,535 Setting decoder property: nnet-mode = 3 INFO 2018-07-19 12:38:45,536 Setting decoder property: ivector-extraction-config = /opt/models/german/kaldi-generic-de-tdnn_sp-r20180611/ivectors_test_hires/conf/ivector_extractor.conf INFO 2018-07-19 12:38:45,536 Setting decoder property: lattice-beam = 6.0 INFO 2018-07-19 12:38:45,536 Setting decoder property: acoustic-scale = 0.083 INFO 2018-07-19 12:38:45,536 Setting decoder property: do-endpointing = True INFO 2018-07-19 12:38:45,536 Setting decoder property: beam = 10.0 INFO 2018-07-19 12:38:45,536 Setting decoder property: mfcc-config = /opt/models/german/kaldi-generic-de-tdnn_sp-r20180611/conf/mfcc.conf INFO 2018-07-19 12:38:45,536 Setting decoder property: traceback-period-in-secs = 0.25 INFO 2018-07-19 12:38:45,536 Setting decoder property: endpoint-silence-phones = 1:2:3:4:5:6:7:8:9:10 INFO 2018-07-19 12:38:45,536 Setting decoder property: word-syms = /opt/models/german/kaldi-generic-de-tdnn_sp-r20180611/model/graph/words.txt INFO 2018-07-19 12:38:45,975 Setting decoder property: num-nbest = 10 INFO 2018-07-19 12:38:45,975 Setting decoder property: max-active = 10000 INFO 2018-07-19 12:38:45,975 Setting decoder property: chunk-length-in-secs = 0.25 INFO 2018-07-19 12:38:45,975 Setting decoder property: fst = /opt/models/german/kaldi-generic-de-tdnn_sp-r20180611/model/graph/HCLG.fst INFO 2018-07-19 12:38:46,122 Setting decoder property: model = /opt/models/german/kaldi-generic-de-tdnn_sp-r20180611/model/final.mdl LOG ([5.4.176~1-be967]:CompileLooped():nnet-compile-looped.cc:334) Spent 0.00969005 seconds in looped compilation. INFO 2018-07-19 12:38:46,162 Created GStreamer elements DEBUG 2018-07-19 12:38:46,162 Adding <main.GstAppSrc object at 0x7f7389867280 (GstAppSrc at 0x15cf7b0)> to the pipeline DEBUG 2018-07-19 12:38:46,163 Adding <main.GstDecodeBin object at 0x7f7389867320 (GstDecodeBin at 0x1714060)> to the pipeline DEBUG 2018-07-19 12:38:46,163 Adding <main.GstAudioConvert object at 0x7f7389867370 (GstAudioConvert at 0x171e0d0)> to the pipeline DEBUG 2018-07-19 12:38:46,163 Adding <main.GstAudioResample object at 0x7f73898673c0 (GstAudioResample at 0x150cf70)> to the pipeline DEBUG 2018-07-19 12:38:46,163 Adding <main.GstTee object at 0x7f7389867410 (GstTee at 0x172c000)> to the pipeline DEBUG 2018-07-19 12:38:46,163 Adding <main.GstQueue object at 0x7f7389867460 (GstQueue at 0x172e170)> to the pipeline DEBUG 2018-07-19 12:38:46,163 Adding <main.GstFileSink object at 0x7f73898674b0 (GstFileSink at 0x1732800)> to the pipeline DEBUG 2018-07-19 12:38:46,163 Adding <main.GstQueue object at 0x7f7389867500 (GstQueue at 0x172e460)> to the pipeline DEBUG 2018-07-19 12:38:46,163 Adding <main.Gstkaldinnet2onlinedecoder object at 0x7f7389867550 (Gstkaldinnet2onlinedecoder at 0x174c150)> to the pipeline DEBUG 2018-07-19 12:38:46,163 Adding <main.GstFakeSink object at 0x7f73898675a0 (GstFakeSink at 0x1748d30)> to the pipeline INFO 2018-07-19 12:38:46,163 Linking GStreamer elements LOG ([5.4.176~1-be967]:ComputeDerivedVars():ivector-extractor.cc:183) Computing derived variables for iVector extractor LOG ([5.4.176~1-be967]:ComputeDerivedVars():ivector-extractor.cc:204) Done. INFO 2018-07-19 12:38:46,283 Setting pipeline to READY INFO 2018-07-19 12:38:46,284 Set pipeline to READY INFO 2018-07-19 12:38:46,285 Opening websocket connection to master server INFO 2018-07-19 12:38:46,288 Opened websocket connection to server DEBUG 2018-07-19 12:40:14,246 : Got message from server of type <class 'ws4py.messaging.TextMessage'> INFO 2018-07-19 12:40:14,246 7413e34d-c84c-4fdd-8c1b-224ce0889036: Initializing request INFO 2018-07-19 12:40:14,248 7413e34d-c84c-4fdd-8c1b-224ce0889036: Started timeout guard DEBUG 2018-07-19 12:40:14,248 7413e34d-c84c-4fdd-8c1b-224ce0889036: Checking that decoder hasn't been silent for more than 10 seconds INFO 2018-07-19 12:40:14,249 7413e34d-c84c-4fdd-8c1b-224ce0889036: Initialized request DEBUG 2018-07-19 12:40:14,440 7413e34d-c84c-4fdd-8c1b-224ce0889036: Got message from server of type <class 'ws4py.messaging.BinaryMessage'> DEBUG 2018-07-19 12:40:14,441 7413e34d-c84c-4fdd-8c1b-224ce0889036: Pushing buffer of size 4000 to pipeline DEBUG 2018-07-19 12:40:14,443 7413e34d-c84c-4fdd-8c1b-224ce0889036: Pushing buffer done DEBUG 2018-07-19 12:40:14,694 7413e34d-c84c-4fdd-8c1b-224ce0889036: Got message from server of type <class 'ws4py.messaging.BinaryMessage'> DEBUG 2018-07-19 12:40:14,694 7413e34d-c84c-4fdd-8c1b-224ce0889036: Pushing buffer of size 4000 to pipeline DEBUG 2018-07-19 12:40:14,695 7413e34d-c84c-4fdd-8c1b-224ce0889036: Pushing buffer done INFO 2018-07-19 12:40:14,704 7413e34d-c84c-4fdd-8c1b-224ce0889036: Connecting audio decoder INFO 2018-07-19 12:40:14,705 7413e34d-c84c-4fdd-8c1b-224ce0889036: Connected audio decoder ERROR ([5.4.176~1-be967]:OnlineTransform():online-feature.cc:421) Dimension mismatch: source features have dimension 91 and LDA #cols is 280

[ Stack-Trace: ] kaldi::MessageLogger::HandleMessage(kaldi::LogMessageEnvelope const&, char const) kaldi::MessageLogger::~MessageLogger() kaldi::OnlineTransform::OnlineTransform(kaldi::MatrixBase const&, kaldi::OnlineFeatureInterface) kaldi::OnlineIvectorFeature::OnlineIvectorFeature(kaldi::OnlineIvectorExtractionInfo const&, kaldi::OnlineFeatureInterface*) kaldi::OnlineNnet2FeaturePipeline::OnlineNnet2FeaturePipeline(kaldi::OnlineNnet2FeaturePipelineInfo const&)

clone

terminate called after throwing an instance of 'std::runtime_error' what():

alumae commented 6 years ago

Most likely, your MFCC conf file (decoder->mfcc-config in YAML) is not compatible with your acoustic model.

svenha commented 5 years ago

I had the same error message (not in the context of kaldi-gstreamer-server, but for my calls to the kaldi script online2-wav-nnet3-latgen-faster) when using such a model. I had to change the mfcc-config value to .../mfcc_hires.conf and it worked as expected.

BaderEddineB commented 4 years ago

i had the same error with french model, how can we fix it ?

BaderEddineB commented 4 years ago

@svenha i didn't understand what you did !

svenha commented 4 years ago

@BaderEddineB Can you provide details about your model for French?

BaderEddineB commented 4 years ago

I try use this pre-build french nnet3 model (https://github.com/pguyot/zamia-speech/releases/download/20190930/kaldi-generic-fr-tdnn_f-r20191016.tar.xz) with kaldi gstreamer server (kaldinnet2onlinedecoder plugin) i start the worker and server correctly. But when i try to use the python client to decode a audio file i get a Dimension mismatch Error (ERROR ([5.5.732~1-67db3]:OnlineTransform():online-feature.cc:533) Dimension mismatch: source features have dimension 91 and LDA #cols is 280).

this is my yaml file: use-nnet2: True decoder:

All the properties nested here correspond to the kaldinnet2onlinedecoder GStreamer plugin properties.

# Use gst-inspect-1.0 ./libgstkaldionline2.so kaldinnet2onlinedecoder to discover the available properties
nnet-mode : 3
use-threaded-decoder:  true
model : test/models/Zamia-fr/model/final.mdl
#lda-mat: test/models/Zamia-fr/extractor/final.mat
word-syms : test/models/Zamia-fr/model/graph/words.txt
fst : test/models/Zamia-fr/model/graph/HCLG.fst
mfcc-config : test/models/Zamia-fr/conf/mfcc.conf
ivector-extraction-config : test/models/Zamia-fr/ivectors_test_hires/conf/ivector_extractor.conf

max-active: 10000
beam: 10.0
lattice-beam: 6.0
acoustic-scale: 0.083
do-endpointing : true
endpoint-silence-phones : "1:2:3:4:5:6:7:8:9:10"
traceback-period-in-secs: 0.25
chunk-length-in-secs: 0.25
num-nbest: 1

out-dir: tmp

use-vad: False silence-timeout: 10

Just a sample post-processor that appends "." to the hypothesis

post-processor: perl -npe 'BEGIN {use IO::Handle; STDOUT->autoflush(1);} sleep(1); s/(.*)/\1./;'

post-processor: (while read LINE; do echo $LINE; done)

A sample full post processor that add a confidence score to 1-best hyp and deletes other n-best hyps

full-post-processor: ./sample_full_post_processor.py

logging: version : 1 disable_existing_loggers: False formatters: simpleFormater: format: '%(asctime)s - %(levelname)7s: %(name)10s: %(message)s' datefmt: '%Y-%m-%d %H:%M:%S' handlers: console: class: logging.StreamHandler formatter: simpleFormater level: DEBUG root: level: DEBUG handlers: [console]

mfcc.conf:

--use-energy=false --sample-frequency=16000

mfcc_hires.conf:

config for high-resolution MFCC features, intended for neural network training

Note: we keep all cepstra, so it has the same info as filterbank features,

but MFCC is more easily compressible (because less correlated) which is why

we prefer this method.

--use-energy=false # use average of log energy, not energy. --num-mel-bins=40 # similar to Google's setup. --num-ceps=40 # there is no dimensionality reduction. --low-freq=20 # low cutoff frequency for mel bins... this is high-bandwidth data, so

there might be some information at the low end.

--high-freq=-400 # high cutoff frequently, relative to Nyquist of 8000 (=7600)

ivector_extractor.conf:

--cmvn-config=exp/nnet3_chain/ivectors_test_hires/conf/online_cmvn.conf --ivector-period=10 --splice-config=exp/nnet3_chain/ivectors_test_hires/conf/splice.conf --lda-matrix=exp/nnet3_chain/extractor/final.mat --global-cmvn-stats=exp/nnet3_chain/extractor/global_cmvn.stats --diag-ubm=exp/nnet3_chain/extractor/final.dubm --ivector-extractor=exp/nnet3_chain/extractor/final.ie --num-gselect=5 --min-post=0.025 --posterior-scale=0.1 --max-remembered-frames=1000 --max-count=0

svenha commented 4 years ago

The following helped me for the dimension mismatch error: change in the yaml file mfcc.conf to mfcc_hires.conf. But my context was a German model built with zamia-speech.

svenha commented 4 years ago

@BaderEddineB I read in the kaldi group that you found a solution: https://groups.google.com/g/kaldi-help/c/taXB2D1v2D4 . Please report the details of your solution here so that others can learn from it.

BaderEddineB commented 4 years ago

@svenha I'm sorry for this late response, in fact I did the same thing as you, I just changed the content of the mfcc.conf file to the content of the mfcc_hires.conf file, and I changed the directories in ivector_extractor .conf and it worked.

alumae / kaldi-gstreamer-server

Using a pre built nnet3 model #140

All the properties nested here correspond to the kaldinnet2onlinedecoder GStreamer plugin properties.

Just a sample post-processor that appends "." to the hypothesis

post-processor: (while read LINE; do echo $LINE; done)

A sample full post processor that add a confidence score to 1-best hyp and deletes other n-best hyps

full-post-processor: ./sample_full_post_processor.py

config for high-resolution MFCC features, intended for neural network training

Note: we keep all cepstra, so it has the same info as filterbank features,

but MFCC is more easily compressible (because less correlated) which is why

we prefer this method.

there might be some information at the low end.