Segmentation fault in BrainSpeechDecoder::Rescore()

donatojb commented 8 months ago

I am running this notebook and I got a segmentation fault in the N-gram decoding loop. The line that is triggering the segfault is this one. I downloaded the N-gram graphs from here and compiled following these instructions. Running the code with rescore=False when calling lmDecoderUtils.lm_decode works well.

This example produces the segmentation fault:

import neuralDecoder.utils.lmDecoderUtils as lmDecoderUtils

lmDir = './languageModel'
ngramDecoder = lmDecoderUtils.build_lm_decoder(
    lmDir,
    acoustic_scale=0.5,
    nbest=100,
    beam=18
)

ngramDecoder.Rescore()

The otuput in the terminal is the following:

WARNING: Logging before InitGoogleLogging() is written to STDERR
I0117 12:34:17.155279 762859 brain_speech_decoder.h:52] Reading fst ./languageModel/TLG.fst
I0117 12:36:37.021819 762859 brain_speech_decoder.h:81] Reading symbol table ./languageModel/words.txt
Fatal Python error: Segmentation fault

Current thread 0x00007ff887cfb740 (most recent call first):
  File "/home/gridsan/dbeneto/TFG/speechBCI/debug.py", line 11 in <module>
Segmentation fault

cffan commented 8 months ago

Thanks for all the info. We'll look into this soon. One question: did you use the pretrained RNN or your custom model?

donatojb commented 8 months ago

I used the pretrained RNN

cffan commented 8 months ago

I couldn't reproduce the segfault in this notebook. One thing I noticed is that if lmDecoderUtils.lm_decode is interrupted while running, it may crash the next time it runs because corrupted internal state.

The segfault in the example code you provided is because you cannot call Rescore before running the actually decoding. The correct code should be:

lm_decoder.DecodeNumpy(ngramDecoder, logits, logPriors, blankPenalty)
ngramDecoder.FinishDecoding()
ngramDecoder.Rescore()

Can you also list what's inside your languageModel folder?

LeonHermann322 commented 6 months ago

I'm facing the same problem at the moment. If I'm deactivating rescore it executes as expected, but with rescore active it fails during the rescoring. The language model was downloaded from the most recent language model here. Its contents are G.fst L.fst LG.fst T.fst TLG.fst lexicon_numbers.txt tokens.txt units.txt words.txt. I'm executing this script for evaluation.

lm_decoder.DecodeNumpy(ngramDecoder, logits, logPriors, blankPenalty) ngramDecoder.FinishDecoding() ngramDecoder.Rescore()

The execution order as mentioned in your previous comment should be correct as this script is executing the function neuralDecoder.utils.lmDecoderUtils.lm_decode for this.

cffan commented 6 months ago

@LeonHermann322, did you use the pre-trained RNN or your custom model?

LeonHermann322 commented 6 months ago

I'm using a custom model, but the model output should be structured in the same way. Since @donatojb was using the pretrained RNN I assumed that there could be a bug independent from the model. Right now, I'm working on making sure that the error is not caused by our model output.

cffan commented 6 months ago

I'm facing the same problem at the moment. If I'm deactivating rescore it executes as expected, but with rescore active it fails during the rescoring. The language model was downloaded from the most recent language model here. Its contents are G.fst L.fst LG.fst T.fst TLG.fst lexicon_numbers.txt tokens.txt units.txt words.txt. I'm executing this script for evaluation.

lm_decoder.DecodeNumpy(ngramDecoder, logits, logPriors, blankPenalty) ngramDecoder.FinishDecoding() ngramDecoder.Rescore()

The execution order as mentioned in your previous comment should be correct as this script is executing the function neuralDecoder.utils.lmDecoderUtils.lm_decode for this.

Actually it looks like you downloaded the 3gram LM? Only the 5gram LM supports rescore. If you want to use the 3gram, set this to False.

We'll clarify this in the README.

donatojb commented 6 months ago

I was also using the 3gram LM, so that was the problem, thank you. Is there a download link for the 5gram model?

LeonHermann322 commented 6 months ago

Ah, that has to be it then. Thank you for your replies, that saves us a lot of time :)!

cffan commented 6 months ago

@donatojb, you can download the 5gram LM from the same website where you downloadd the 3gram LM.

fwillett / speechBCI

Segmentation fault in BrainSpeechDecoder::Rescore() #13