AnantLabs / berkeleylm

Automatically exported from code.google.com/p/berkeleylm
0 stars 0 forks source link

Getting NAN on last trigram when using google binary #20

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
Hi
Adding to my previous posts in issues 19, I am trying to use google binary 
(from google books) and get log probabilities of trigrams from some text. I am 
getting NAN from the last trigrams. Attached is the code of what I am trying to 
do. I am slightly modified these files and added some System.out.printlns to 
see the outputs.

I text I am testing with is "Hello how are you". So essentially it is giving me 
a sent [7380255 15474 152 26 45 7380256]. 7380255 is the start symbol and 
7380256 is the stop symbol.

I am first getting the log probability of the bigram 7380255 15474, by passing 
startpos as 0 and endpos as 2. Thereafter I am getting the log probabilities of 
trigrams starting with startpos 0, like the code below

for (int i = 0; i <= sent.length - 3; i++) {
    System.out.println("Getting score from " + sent[i] + " to " + sent[i+2]);
    score = lm_.getLogProb(sent, i, i+3);
    System.out.println("score " + score);
    if(Float.isNaN(score))
    System.out.println("Returned NaN");
    else
    sentScore += score;
}

The problem is happening with within StupidBackoffLm in the following line 
probContext = localMap.getValueAndOffset(probContext, probContextOrder, 
ngram[i], scratch);
only with the last trigram when startpost is 3 and end pos is 6.
scratch.value is returning -1 with ngram[i] being the end symbol or 7380256. 
This is resulting in a NAN logprob. 

I tried the same with scoreSentence, it gives the same problem.

Can you please help me in understanding what mistake I am doing ?

Thanks
Regards
Debanjan

Original issue reported on code.google.com by b.deban...@gmail.com on 24 Mar 2014 at 11:36

Attachments:

GoogleCodeExporter commented 9 years ago
Any chance you can give me a command line and data set that reproduces the 
crash?

Original comment by adpa...@gmail.com on 7 Sep 2014 at 7:06