Open rlevy opened 4 years ago
This seems wrong: why does we get such a huge surprisal for sentence-end after a period? Input file was:
This is a short sentence.
Command & output:
$ lm-zoo get-surprisals ngram ~/tmp/sentences.txt reading /opt/srilm/checkpoint/model.lm in binary format sentence_id token_id token surprisal 1 1 this 5.29354 1 2 is 3.1117 1 3 a 2.92768 1 4 short 9.45191 1 5 sentence 12.0459 1 6 . 3.6674900000000004 1 7 </s> 28.1537
Doesn't happen for GRNN (the -0.0 is a tiny bit funny but probably not worrying about):
-0.0
$ lm-zoo get-surprisals GRNN ~/tmp/sentences.txt sentence_id token_id token surprisal 1 1 This 0.0 1 2 is 1.7249029999999999 1 3 a 1.4204510000000001 1 4 short 8.294603 1 5 sentence 10.343164 1 6 . 3.59838 1 7 <eos> -0.0
isn't it strange that "This" has a surprisal of 0.0 as well?? @rlevy , I haven't seen any reaction in the issues or the chat (https://gitter.im/lm-zoo/community), is this project still alive?
This seems wrong: why does we get such a huge surprisal for sentence-end after a period? Input file was:
Command & output:
Doesn't happen for GRNN (the
-0.0
is a tiny bit funny but probably not worrying about):