What steps will reproduce the problem?
1. Create an lm with evaluate-ngram and eval-perp param
2. Use estimate-gram with eval-perp on the same LM
3. Perplexity results differ
What is the expected output? What do you see instead?
evaluate-ngram -lm rlst8-similar.lm -eval-perp "$TRANSCRIPT_CONT,
$TRANSCRIPT_SENT"
0.001 Loading LM rlst8-similar.lm...
7.262 Perplexity Evaluations:
7.262 Loading eval set
/data/src/sphinx/experiments/transcripts/rlst-transcript.corpus...
7.318 /data/src/sphinx/experiments/transcripts/rlst-transcript.corpus 385.071
7.322 Loading eval set
/data/src/sphinx/experiments/transcripts/rlst-transcript.sentences...
7.376 /data/src/sphinx/experiments/transcripts/rlst-transcript.sentences 312.22
4
$ estimate-ngram -unk 1 -vocab $VOCAB_AUGMENTED -text $SENTENCE_CORPUS -wl
$LM_SIMILAR -eval-perp "$TRANSCRIPT_CONT, $TRANSCRIPT_SENT"
0.001 Replace unknown words with <unk>...
0.001 Loading vocab rlst8-merged-vocab.txt...
0.013 Loading corpus sentences.similar.corpus...
10.127 Smoothing[1] = ModKN
10.127 Smoothing[2] = ModKN
10.127 Smoothing[3] = ModKN
10.127 Set smoothing algorithms...
10.243 Estimating full n-gram model...
10.459 Saving LM to rlst8-similar.lm...
14.192 Perplexity Evaluations:
14.192 Loading eval set
/data/src/sphinx/experiments/transcripts/rlst-transcript.corpus...
14.351 /data/src/sphinx/experiments/transcripts/rlst-transcript.corpus 377.913
14.359 Loading eval set
/data/src/sphinx/experiments/transcripts/rlst-transcript.sentences...
14.516 /data/src/sphinx/experiments/transcripts/rlst-transcript.sentences 307.0
90
I would expect the two sets of perplexity results to be the same.
The difference appears to arise from use of the "-unk" parameter. Without these
(i.e. LM excludes <unk>), the perplexity results from estimate-ngram and
evaluate-ngram are the same.
What version of the product are you using? On what operating system?
r48
MacOS X 10.6.1
Please provide any additional information below.
Original issue reported on code.google.com by smarqu...@gmail.com on 4 Jun 2011 at 7:42
Original issue reported on code.google.com by
smarqu...@gmail.com
on 4 Jun 2011 at 7:42