alumae / gst-kaldi-nnet2-online

GStreamer plugin around Kaldi's online neural network decoder
Apache License 2.0
185 stars 100 forks source link

Beams have no effect for nnet3 models #100

Closed nshmyrev closed 3 years ago

nshmyrev commented 3 years ago

With this setup:

https://github.com/laurensw75/docker-Kaldi-NL

with this yml file:

description:
    language: Dutch
    identifier: CGN_all
    modeltype: NNet3
use-nnet2: True
decoder:
    # All the properties nested here correspond to the kaldinnet2onlinedecoder GStreamer plugin properties.
    # Use gst-inspect-1.0 ./libgstkaldionline2.so kaldinnet2onlinedecoder to discover the available properties
    nnet-mode : 3
    use-threaded-decoder : true
    model : /opt/kaldi-gstreamer-server/mod/final.mdl
    word-syms : /opt/kaldi-gstreamer-server/mod/words.txt
    fst : /opt/kaldi-gstreamer-server/mod/HCLG.fst
    mfcc-config : /opt/kaldi-gstreamer-server/mod/conf/mfcc.conf
    ivector-extraction-config : /opt/kaldi-gstreamer-server/mod/conf/ivector_extractor.conf
    frame-subsampling-factor : 3
    max-active: 7000
    beam: 10.0
    lattice-beam: 6.0
    acoustic-scale: 0.9
    do-endpointing : true
    endpoint-silence-phones : "1:2:3:4:5"
    endpoint-rule1-min-trailing-silence : 1.0
    traceback-period-in-secs : 0.25
    chunk-length-in-secs : 0.25
    num-nbest : 1
    #Additional functionality that you can play with:
    lm-fst : /opt/kaldi-gstreamer-server/mod/G.fst
    big-lm-const-arpa : /opt/kaldi-gstreamer-server/mod/G.carpa
    phone-syms : /opt/kaldi-gstreamer-server/mod/phones.txt
    word-boundary-file : /opt/kaldi-gstreamer-server/mod/word_boundary.int
#    do-phone-alignment : false
# If specified, this location stores all audio in 'raw' format    
out-dir: tmp

use-vad: False
silence-timeout: 120

# Just a sample post-processor that appends "." to the hypothesis
post-processor: perl -npe 'BEGIN {use IO::Handle; STDOUT->autoflush(1);} s/(.*)/\1./;'

# A sample full post processor that add a confidence score to 1-best hyp and deletes other n-best hyps
full-post-processor: /opt/kaldi-gstreamer-server/sample_full_post_processor.py

logging:
    version : 1
    disable_existing_loggers: False
    formatters:
        simpleFormater:
            format: '%(asctime)s - %(levelname)7s: %(name)10s: %(message)s'
            datefmt: '%Y-%m-%d %H:%M:%S'
    handlers:
        console:
            class: logging.StreamHandler
            formatter: simpleFormater
            level: DEBUG
    root:
        level: DEBUG
        handlers: [console]

beam config has no effect at all. Decoding is very slow (and a bit more accurate) no matter which max-active parameter or beam is configured in yml file. It seems that the default values (beam=16, max-active=27618...; lattice-beam=10) are used.

The problem seems be related to the following piece of code.

https://github.com/alumae/gst-kaldi-nnet2-online/blob/31d77e0ec34a8160bde60927022e4f262af1b935/src/gstkaldinnet2onlinedecoder.cc#L535

alumae commented 3 years ago

Although I couldn't reproduce it with my own nnet3 setup, the code block that you pointed out is obviously wrong (since nnet3 does not have a threaded decoder) and is now fixed.

nshmyrev commented 3 years ago

Thank you!