Hi
Adding to my previous posts in issues 19, I am trying to use google binary
(from google books) and get log probabilities of trigrams from some text. I am
getting NAN from the last trigrams. Attached is the code of what I am trying to
do. I am slightly modified these files and added some System.out.printlns to
see the outputs.
I text I am testing with is "Hello how are you". So essentially it is giving me
a sent [7380255 15474 152 26 45 7380256]. 7380255 is the start symbol and
7380256 is the stop symbol.
I am first getting the log probability of the bigram 7380255 15474, by passing
startpos as 0 and endpos as 2. Thereafter I am getting the log probabilities of
trigrams starting with startpos 0, like the code below
for (int i = 0; i <= sent.length - 3; i++) {
System.out.println("Getting score from " + sent[i] + " to " + sent[i+2]);
score = lm_.getLogProb(sent, i, i+3);
System.out.println("score " + score);
if(Float.isNaN(score))
System.out.println("Returned NaN");
else
sentScore += score;
}
The problem is happening with within StupidBackoffLm in the following line
probContext = localMap.getValueAndOffset(probContext, probContextOrder,
ngram[i], scratch);
only with the last trigram when startpost is 3 and end pos is 6.
scratch.value is returning -1 with ngram[i] being the end symbol or 7380256.
This is resulting in a NAN logprob.
I tried the same with scoreSentence, it gives the same problem.
Can you please help me in understanding what mistake I am doing ?
Thanks
Regards
Debanjan
Original issue reported on code.google.com by b.deban...@gmail.com on 24 Mar 2014 at 11:36
Original issue reported on code.google.com by
b.deban...@gmail.com
on 24 Mar 2014 at 11:36Attachments: