i like cookies .
my hands are dirty .
i like cookies .
how about a nice steak ?
leaves are falling in september .
my name is john snow .
i like chicken soup .
i want to play in the snow .
i like cookies .
And I'm getting batch perplexity between 2 and 20 000 (on a non-synthetic test I get values in the millions).
Is this normal? Does it make sense to use the log of that value?
I'm running it on Ubuntu with latest code in master, on CPU.
Sorry, just found out these values are fine. It bothered me how they vary on the same sentence, but the average perplexity of all words in it doesn't change much.
I'm trying to calculate perplexity using the checkpoint model provided here: https://github.com/allenai/bilm-tf#can-you-provide-the-tensorflow-checkpoint-from-training
But it looks like the returned numbers are too large and random.
This is the code I run (based on the tests in test_training.py):
My test file:
And I'm getting batch perplexity between 2 and 20 000 (on a non-synthetic test I get values in the millions).
Is this normal? Does it make sense to use the log of that value?
I'm running it on Ubuntu with latest code in master, on CPU.