alumae / kaldi-gstreamer-server

Real-time full-duplex speech recognition server, based on the Kaldi toolkit and the GStreamer framwork.
BSD 2-Clause "Simplified" License
1.07k stars 341 forks source link

How to return word in nnet results? #32

Closed dgonzo closed 7 years ago

dgonzo commented 8 years ago

I have this working with the fisher nnet model and would like to return the word level results as depicted in gst-kaldi-nnet2-online structured results.

I've made the following changes to the config file and the post-processor, see below, but I'm failing to get anything back. I can see in the worker log that the most recent segment is being returned with the expected data but it's not being returned to the client (both python and http just stall).

fisher_english_nne2.yaml

use-nnet2: True
decoder:
    # All the properties nested here correspond to the kaldinnet2onlinedecoder GStreamer plugin properties.
    # Use gst-inspect-1.0 ./libgstkaldionline2.so kaldinnet2onlinedecoder to discover the available properties
    use-threaded-decoder:  true
    model : test/models/english/fisher_nnet_a_gpu_online/final.mdl
    fst : test/models/english/fisher_nnet_a_gpu_online/HCLG.fst
    word-syms : test/models/english/fisher_nnet_a_gpu_online/words.txt
    feature-type : mfcc
    mfcc-config : test/models/english/fisher_nnet_a_gpu_online/conf/mfcc.conf
    ivector-extraction-config : test/models/english/fisher_nnet_a_gpu_online/conf/ivector_extractor.fixed.conf
    max-active: 10000
    beam: 11.0
    lattice-beam: 5.0
    do-endpointing : true
    endpoint-silence-phones : "1:2:3:4:5:6:7:8:9:10"
    chunk-length-in-secs: 0.2
    #acoustic-scale: 0.083
    #traceback-period-in-secs: 0.2
    #num-nbest: 10
    #Additional functionality that you can play with:
    #lm-fst:  test/models/english/fisher_nnet_a_gpu_online/G.fst
    #big-lm-const-arpa: test/models/english/fisher_nnet_a_gpu_online/G.carpa
    phone-syms: test/models/english/fisher_nnet_a_gpu_online/phones.txt
    word-boundary-file: test/models/english/fisher_nnet_a_gpu_online/word_boundary.int
    do-phone-alignment: true
out-dir: tmp

use-vad: False
silence-timeout: 10

# Just a sample post-processor that appends "." to the hypothesis
post-processor: perl -npe 'BEGIN {use IO::Handle; STDOUT->autoflush(1);} s/(.*)/\1./;'

# A sample full post processor that add a confidence score to 1-best hyp and deletes other n-best hyps
full-post-processor: ./post_processor.py

logging:
    version : 1
    disable_existing_loggers: False
    formatters:
        simpleFormater:
            format: '%(asctime)s - %(levelname)7s: %(name)10s: %(message)s'
            datefmt: '%Y-%m-%d %H:%M:%S'
    handlers:
        console:
            class: logging.StreamHandler
            formatter: simpleFormater
            level: DEBUG
    root:
        level: DEBUG
        handlers: [console]

post_processor.py

import sys
import json
import logging
from math import exp

def post_process_json(str):
    try:
        event = json.loads(str)
        if "result" in event:
            if len(event["result"]["hypotheses"]) > 1:
                likelihood1 = event["result"]["hypotheses"][0]["likelihood"]
                likelihood2 = event["result"]["hypotheses"][1]["likelihood"]
                confidence = likelihood1 - likelihood2
                confidence = 1 - exp(-confidence)
            else:
                confidence = 1.0e+10;
            event["result"]["hypotheses"][0]["confidence"] = confidence

            event["result"]["hypotheses"][0]["transcript"] += "."
            del event["result"]["hypotheses"][1:]
        return json.dumps(event)
    except:
        exc_type, exc_value, exc_traceback = sys.exc_info()
        logging.error("Failed to process JSON result: %s : %s " % (exc_type, exc_value))
        return str

if __name__ == "__main__":
    logging.basicConfig(level=logging.DEBUG, format="%(levelname)8s %(asctime)s %(message)s ")

    lines = []
    while True:
        l = sys.stdin.readline()
        if not l: break # EOF
        if l.strip() == "":
            if len(lines) > 0:
                result_json = post_process_json("".join(lines))
                print result_json
                print
                sys.stdout.flush()
                lines = []
        else:
            lines.append(l)

    if len(lines) > 0:
        result_json = post_process_json("".join(lines))
        print result_json
        lines = []
alumae commented 8 years ago

If you disable the post-processor, it doesn't stall?