Open donatojb opened 8 months ago
Thanks for all the info. We'll look into this soon. One question: did you use the pretrained RNN or your custom model?
I used the pretrained RNN
I couldn't reproduce the segfault in this notebook. One thing I noticed is that if lmDecoderUtils.lm_decode
is interrupted while running, it may crash the next time it runs because corrupted internal state.
The segfault in the example code you provided is because you cannot call Rescore
before running the actually decoding. The correct code should be:
lm_decoder.DecodeNumpy(ngramDecoder, logits, logPriors, blankPenalty)
ngramDecoder.FinishDecoding()
ngramDecoder.Rescore()
Can you also list what's inside your languageModel
folder?
I'm facing the same problem at the moment. If I'm deactivating rescore it executes as expected, but with rescore active it fails during the rescoring.
The language model was downloaded from the most recent language model here. Its contents are
G.fst L.fst LG.fst T.fst TLG.fst lexicon_numbers.txt tokens.txt units.txt words.txt
.
I'm executing this script for evaluation.
lm_decoder.DecodeNumpy(ngramDecoder, logits, logPriors, blankPenalty) ngramDecoder.FinishDecoding() ngramDecoder.Rescore()
The execution order as mentioned in your previous comment should be correct as this script is executing the function neuralDecoder.utils.lmDecoderUtils.lm_decode for this.
@LeonHermann322, did you use the pre-trained RNN or your custom model?
I'm using a custom model, but the model output should be structured in the same way. Since @donatojb was using the pretrained RNN I assumed that there could be a bug independent from the model. Right now, I'm working on making sure that the error is not caused by our model output.
I'm facing the same problem at the moment. If I'm deactivating rescore it executes as expected, but with rescore active it fails during the rescoring. The language model was downloaded from the most recent language model here. Its contents are
G.fst L.fst LG.fst T.fst TLG.fst lexicon_numbers.txt tokens.txt units.txt words.txt
. I'm executing this script for evaluation.lm_decoder.DecodeNumpy(ngramDecoder, logits, logPriors, blankPenalty) ngramDecoder.FinishDecoding() ngramDecoder.Rescore()
The execution order as mentioned in your previous comment should be correct as this script is executing the function neuralDecoder.utils.lmDecoderUtils.lm_decode for this.
Actually it looks like you downloaded the 3gram LM? Only the 5gram LM supports rescore. If you want to use the 3gram, set this to False.
We'll clarify this in the README.
I was also using the 3gram LM, so that was the problem, thank you. Is there a download link for the 5gram model?
Ah, that has to be it then. Thank you for your replies, that saves us a lot of time :)!
@donatojb, you can download the 5gram LM from the same website where you downloadd the 3gram LM.
I am running this notebook and I got a segmentation fault in the N-gram decoding loop. The line that is triggering the segfault is this one. I downloaded the N-gram graphs from here and compiled following these instructions. Running the code with
rescore=False
when callinglmDecoderUtils.lm_decode
works well.This example produces the segmentation fault:
The otuput in the terminal is the following: