allenai / bilm-tf

Tensorflow implementation of contextualized word representations from bi-directional language models
Apache License 2.0
1.62k stars 452 forks source link

Perplexity returns very random values even after warmup #189

Closed ykiprov closed 5 years ago

ykiprov commented 5 years ago

I'm trying to calculate perplexity using the checkpoint model provided here: https://github.com/allenai/bilm-tf#can-you-provide-the-tensorflow-checkpoint-from-training

But it looks like the returned numbers are too large and random.

This is the code I run (based on the tests in test_training.py):

from bilm.data import BidirectionalLMDataset
from bilm.training import load_vocab, load_options_latest_checkpoint, test
if __name__ == "__main__":
    data_folder = "/path/to/checkpoint/folder/"
    _options, _ckpt_file = load_options_latest_checkpoint(data_folder)
    _vocab = load_vocab(data_folder + 'vocab-2016-09-10.txt', 50)
    prefix = "/path/to/test.txt"
    _data = BidirectionalLMDataset(prefix, _vocab, test=True)
    _perplexity = test(_options, _ckpt_file, _data, batch_size=1)

My test file:

i like cookies .
my hands are dirty .
i like cookies .
how about a nice steak ?
leaves are falling in september .
my name is john snow .
i like chicken soup .
i want to play in the snow .
i like cookies .

And I'm getting batch perplexity between 2 and 20 000 (on a non-synthetic test I get values in the millions).

Is this normal? Does it make sense to use the log of that value?

I'm running it on Ubuntu with latest code in master, on CPU.

ykiprov commented 5 years ago

Sorry, just found out these values are fine. It bothered me how they vary on the same sentence, but the average perplexity of all words in it doesn't change much.