awslabs / mlm-scoring

Python library & examples for Masked Language Model Scoring (ACL 2020)
https://www.aclweb.org/anthology/2020.acl-main.240/
Apache License 2.0
333 stars 59 forks source link

IndexError: too many indices for tensor of dimension 1 #8

Open mfelice opened 3 years ago

mfelice commented 3 years ago

Hi there,

I'm using the PyTorch implementation with bert-base-uncased and I get the following error when the sentence contains only one token:

Traceback (most recent call last):
  File "bert.py", line 28, in <module>
    print(scorer.score_sentences(["Hello"]))
  File ".../mlm-scoring/src/mlm/scorers.py", line 167, in score_sentences
    return self.score(corpus, **kwargs)[0]
  File ".../mlm-scoring/src/mlm/scorers.py", line 757, in score
    out = out[list(range(split_size)), token_masked_ids]
IndexError: too many indices for tensor of dimension 1

It works fine with MXNet MLMs, but I need to use a community model from HuggingFace.

Thanks!

mfelice commented 3 years ago

OK, I think I found the problem.

https://github.com/awslabs/mlm-scoring/blob/672729747432810f9bcb37149104124dd3cc4165/src/mlm/scorers.py#L727

should be changed to:

out = torch.reshape(out[0], (out[0].shape[0], -1))

squeeze() was removing a dimension that should be preserved.

DarrenAbramson commented 3 years ago

Hurray for publicly licensed software and donation of labour to the public good!