GPT Models Scoring error

VP007-py commented 3 years ago

I tried scoring sentences with the models mentioned here . Every model works fine except for gpt2-117m-en-cased and gpt2-345m-en-cased. The following error pops up

Traceback (most recent call last):
  File "sample.py", line 16, in <module>
    print(scorer.score_sentences(["Hello world!"]))
  File "/home/pandramish.vinay/mlm-scoring/src/mlm/scorers.py", line 148, in score_sentences
    return self.score(corpus, **kwargs)[0]
  File "/home/pandramish.vinay/mlm-scoring/src/mlm/scorers.py", line 396, in score
    dataset = self.corpus_to_dataset(corpus)
  File "/home/pandramish.vinay/mlm-scoring/src/mlm/scorers.py", line 364, in corpus_to_dataset
    ids_masked = self._ids_to_masked(ids_original)
  File "/home/pandramish.vinay/mlm-scoring/src/mlm/scorers.py", line 329, in _ids_to_masked
    mask_token_id = self._vocab.token_to_idx[self._vocab.mask_token]
AttributeError: 'Vocab' object has no attribute 'mask_token'

Any fixes ?

JulianSlzr commented 3 years ago

Thanks for filing the first issue! Sorry for the delayed response (didn't have e-mail notifications on 😓).

The issue is GPT-2 is an autoregressive LM and gives true log-likelihood scores. You need to use LMScorer, not MLMScorer. The README was unclear; my mistake.

I've updated the README and added pre-emptive error messages to MLMScorer, LMScorer, etc.; hope this helps.

VP007-py commented 3 years ago

@JulianSlzr Thanks for the fix and kudos to your awesome work !

awslabs / mlm-scoring

GPT Models Scoring error #1