A quite large perplexity issue

gotutiyan commented 1 year ago

Hi, thank you for your developing lmppl.

I have a question about too large perplexity.

I installed lmppl and execute the commands described in the README as follows, but get_perplexity() returns quite large value. Is there something wrong with the procedure?

>>> import lmppl
>>> scorer = lmppl.LM('gpt2')
Using pad_token, but it is not set yet.
>>> text = [
    'sentiment classification: I dropped my laptop on my knee, and someone stole my coffee. I am happy.',
    'sentiment classification: I dropped my laptop on my knee, and someone stole my coffee. I am sad.'
]
>>> ppl = scorer.get_perplexity(text)
100%|██████████| 1/1 [00:00<00:00,  3.03it/s]
>>> ppl
[4.2328431180493815e+43, 4.732356477497072e+43] # <-- They are quite large, there seems to be something wrong.

Version of some modules in my environment:

python 3.7.10
lmppl==0.2.9
transformers==4.11.3
torch==1.12.1+cu116

Thank you.

asahi417 commented 1 year ago

Hi, thank you so much to find out the issue! I figured out that for models such as gpt-2 and opt, they don't have a padding token as a default, so I added one at loading the models (https://github.com/asahi417/lmppl/blob/main/lmppl/ppl_recurrent_lm.py#L70). If a new padding token was added in a post-hoc manner, the logit on the newly added padding token became high, and that resulted in a explosive perplexity in the end. I fixed it by disregarding the newly added padding token at computing negative log likelihood, and now it produces reliable scores.

asahi417 commented 1 year ago

I also double checked the perplexity given by the huggingface introduction https://huggingface.co/docs/transformers/perplexity and confirmed that the one from lmppl matched to those produced with the introduction.

asahi417 / lmppl

A quite large perplexity issue #5