alumae / kaldi-gstreamer-server

Real-time full-duplex speech recognition server, based on the Kaldi toolkit and the GStreamer framwork.
BSD 2-Clause "Simplified" License
1.07k stars 342 forks source link

What kind of features should be included in nnet2.yaml? #173

Open hcnoh opened 5 years ago

hcnoh commented 5 years ago

I would like to run our custom kaldi tdnn model on Gstreamer Server. But Im having hard time for writing our own nnet2.yaml file.

Would you mind if I ask you to let me know exact features for this nnet2.yaml file.

Thanks.

gilamsalem commented 5 years ago

This is my aspire s5 model yaml file: (http://kaldi-asr.org/models/m1)

timeout-decoder : 10
use-nnet2: True

decoder:
    # All the properties nested here correspond to the kaldinnet2onlinedecoder GStreamer plugin properties.
    # Use gst-inspect-1.0 ./libgstkaldionline2.so kaldinnet2onlinedecoder to discover the available properties
    use-threaded-decoder:  true
    model : /kaldi/egs/aspire/s5/exp/tdnn_7b_chain_online/final.mdl
    word-syms : /kaldi/egs/aspire/s5/exp/tdnn_7b_chain_online/graph_pp/words.txt
    fst : /kaldi/egs/aspire/s5/exp/tdnn_7b_chain_online/graph_pp/HCLG.fst
    mfcc-config : /kaldi/egs/aspire/s5/exp/tdnn_7b_chain_online/conf/mfcc.conf
    ivector-extraction-config : /kaldi/egs/aspire/s5/exp/tdnn_7b_chain_online/conf/ivector_extractor.conf
    min-active: 200
    max-active: 7000
    beam: 15.0
    lattice-beam: 6.0
    acoustic-scale: 1.0
    do-endpointing : false
    endpoint-silence-phones : "1:2:3:4:5:6:7:8:9:10"
    traceback-period-in-secs: 0.25
    chunk-length-in-secs: 0.25
    frame-subsampling-factor: 3
    num-nbest: 1
    nnet-mode: 3

out-dir: /app/tmp
use-vad: False
silence-timeout: 10

# Just a sample post-processor that appends "." to the hypothesis
# post-processor: perl -npe 'BEGIN {use IO::Handle; STDOUT->autoflush(1);} s/(.*)/\1./;'
# A sample full post processor that add a confidence score to 1-best hyp and deletes other n-best hyps
# full-post-processor: ./sample_full_post_processor.py

logging:
    version : 1
    disable_existing_loggers: False
    formatters:
        simpleFormater:
            format: '%(asctime)s.%(msecs)03d - %(levelname)7s: %(name)10s: %(message)s'
            datefmt: '%Y-%m-%d %H:%M:%S'
    handlers:
        console:
            class: logging.StreamHandler
            formatter: simpleFormater
            level: DEBUG
    root:
        level: DEBUG
        handlers: [console]

Note that I disabled any post processing, and set num-nbest to only 1 result.

hcnoh commented 5 years ago

Thank you for your replying. I followed your yaml file configures. But I have an error message like this:

DEBUG 2019-03-11 04:29:14,388 Starting up worker·
2019-03-11 04:29:14 -    INFO:   decoder2: Creating decoder using conf: {'post-processor': "per>
2019-03-11 04:29:14 -    INFO:   decoder2: Setting decoder property: nnet-mode = 3
2019-03-11 04:29:14 -    INFO:   decoder2: Setting decoder property: ivector-extraction-config 
2019-03-11 04:29:14 -    INFO:   decoder2: Setting decoder property: min-active = 200
2019-03-11 04:29:14 -    INFO:   decoder2: Setting decoder property: lattice-beam = 6.0
2019-03-11 04:29:14 -    INFO:   decoder2: Setting decoder property: acoustic-scale = 1.0
2019-03-11 04:29:14 -    INFO:   decoder2: Setting decoder property: do-endpointing = False
2019-03-11 04:29:14 -    INFO:   decoder2: Setting decoder property: beam = 15.0
2019-03-11 04:29:14 -    INFO:   decoder2: Setting decoder property: mfcc-config = /opt/models/>
2019-03-11 04:29:14 -    INFO:   decoder2: Setting decoder property: traceback-period-in-secs =>
2019-03-11 04:29:14 -    INFO:   decoder2: Setting decoder property: endpoint-silence-phones = >
2019-03-11 04:29:14 -    INFO:   decoder2: Setting decoder property: word-syms = /opt/models/td>
2019-03-11 04:29:15 -    INFO:   decoder2: Setting decoder property: num-nbest = 1
2019-03-11 04:29:15 -    INFO:   decoder2: Setting decoder property: frame-subsampling-factor =>
2019-03-11 04:29:15 -    INFO:   decoder2: Setting decoder property: max-active = 7000
2019-03-11 04:29:15 -    INFO:   decoder2: Setting decoder property: chunk-length-in-secs = 0.25
2019-03-11 04:29:15 -    INFO:   decoder2: Setting decoder property: fst = /opt/models/tdnn1a_s>
2019-03-11 04:29:15 -    INFO:   decoder2: Setting decoder property: model = /opt/models/tdnn1a>
ERROR ([5.4.176~1-be967]:ReadNew():nnet-component-itf.cc:86) Unknown component type TdnnCompone>

[ Stack-Trace: ]                                                                                
kaldi::MessageLogger::HandleMessage(kaldi::LogMessageEnvelope const&, char const*)              
kaldi::MessageLogger::~MessageLogger()                                                          
kaldi::nnet3::Component::ReadNew(std::istream&, bool)                                           
kaldi::nnet3::Nnet::Read(std::istream&, bool)                                                   
kaldi::nnet3::AmNnetSimple::Read(std::istream&, bool)

What is the reason for this?

alumae commented 5 years ago

The Kaldi distribution that the Gstreamer plugin is linked to is old.