alumae / kaldi-gstreamer-server

Real-time full-duplex speech recognition server, based on the Kaldi toolkit and the GStreamer framwork.
BSD 2-Clause "Simplified" License
1.07k stars 341 forks source link

RNN LM Rescoring #82

Open jin1004 opened 7 years ago

jin1004 commented 7 years ago

Hi, I just wanted to ask, is there any way to rescore the lattice using recurrent neural network language model with the current kaldi-gstreamer setup?

Thanks a lot!

alumae commented 7 years ago

It's theoretically possible using the post-processing framework and n-best lists, but it would be quite complicated.

calderma commented 7 years ago

@jin1004 I'm not sure what RNN library you're using but if it has python bindings you could maybe do this--

  1. create a new version of the sample_full_post_processor.py file
  2. import the rescoring method from your rnnlm library (and any other dependencies for it) in that file. In this example we will call the method "rescore_sentence" and assume that it returns a likelihood for the given sentence
  3. find the line in the post_process_json method that reads "if len(event["result"]["hypotheses"]) > 1:" and insert a line after it
  4. on that line write this (all on one line): event['result']['hypotheses'] = sorted(event['result']['hypotheses'],key=lambda x : rescore_sentence(x['transcript']),reverse=True)
  5. save the file
  6. in your xxx.yaml file replace "sample_full_post_processor.py" with the name of your xxx.yaml
  7. save that file and make sure your worker is using it

it should now return the hypothesis transcript with the highest likelihood using your rnnlm. That's probably the simplest way to do it, although you would obviously need to do checks to make sure the model is loaded right, the library can be imported, etc. so that it won't break the whole program.

If you are using a library without python bindings (I don't think faster-rnnlm has any for example) then you would probably need to spawn a subprocess or something which would slow things down some and potentially introduce additional complications. If speed isn't an urgent concern then you could do it that way.

I just made this off the cuff so if there are any problems or i misunderstood what you are trying to do let me know.

jin1004 commented 7 years ago

@calderma Thank you so much! My RNN language model is still in training. I will test it with the method you described and let you know if it works.

Umar17 commented 5 years ago

@jin1004 Did the proposed solution of @calderma work?

LiamLonergan commented 2 years ago

Does anyone have a solution to this?