alumae / kaldi-gstreamer-server

Real-time full-duplex speech recognition server, based on the Kaldi toolkit and the GStreamer framwork.
BSD 2-Clause "Simplified" License
1.07k stars 341 forks source link

Calling the same call twice might return different results #160

Closed gilamsalem closed 5 years ago

gilamsalem commented 5 years ago

I have a kaldi gstreamer server setup. My setup runs Aspire nnet3 model (http://kaldi-asr.org/models/m4). Sometime when I am running the same call to my server, I get different results. The results are not completely different, but one of: noise/laughter or the actual result. (at the following example, my java decoder that stream a file to my kaldi gstreamer server)

>>> java -jar ./client/target/decoder.jar --file ~/voice/demo/file.wav
[laughter]

>>> java -jar ./client/target/decoder.jar --file ~/voice/demo/file.wav
london fooled by beatles five

What can cause such behavior?

gilamsalem commented 5 years ago

My worker yaml file:

timeout-decoder : 10
use-nnet2: True
decoder:
    # All the properties nested here correspond to the kaldinnet2onlinedecoder GStreamer plugin properties.
    # Use gst-inspect-1.0 ./libgstkaldionline2.so kaldinnet2onlinedecoder to discover the available properties
    use-threaded-decoder:  true
    model : $ROOT/kaldi/egs/aspire/s5/exp/tdnn_7b_chain_online/final.mdl
    word-syms : $ROOT/kaldi/egs/aspire/s5/exp/tdnn_7b_chain_online/graph_pp/words.txt
    fst : $ROOT/kaldi/egs/aspire/s5/exp/tdnn_7b_chain_online/graph_pp/HCLG.fst
    mfcc-config : $ROOT/kaldi/egs/aspire/s5/exp/tdnn_7b_chain_online/conf/mfcc.conf
    ivector-extraction-config : $ROOT/kaldi/egs/aspire/s5/exp/tdnn_7b_chain_online/conf/ivector_extractor.conf
    min-active: 200
    max-active: 7000
    beam: 15.0
    lattice-beam: 6.0
    acoustic-scale: 1.0
    do-endpointing : false
    endpoint-silence-phones : "1:2:3:4:5:6:7:8:9:10"
    traceback-period-in-secs: 0.25
    chunk-length-in-secs: 0.25
    frame-subsampling-factor: 3
    num-nbest: 10
    nnet-mode: 3
out-dir: tmp
use-vad: False
silence-timeout: 10

logging:
    version : 1
    disable_existing_loggers: False
    formatters:
        simpleFormater:
            format: '%(asctime)s - %(levelname)7s: %(name)10s: %(message)s'
            datefmt: '%Y-%m-%d %H:%M:%S'
    handlers:
        console:
            class: logging.StreamHandler
            formatter: simpleFormater
            level: DEBUG
    root:
        level: DEBUG
        handlers: [console]
alumae commented 5 years ago

That's OK -- there are random effects in feature extraction due to dithering.