facebookresearch / SentEval

A python tool for evaluating the quality of sentence embeddings.
Other
2.09k stars 309 forks source link

Bizzare dimension out of range Error #8

Closed allenanie closed 7 years ago

allenanie commented 7 years ago

Hi, I've been running SentEval just fine for a couple of weeks, and today, after transferring to a new machine (with PyTorch 0.2), all of a sudden I can't evaluate on TREC anymore.

This is not a problem for any other tasks, so I'm wondering why. Does this mean my TREC data is corrupted or is this a problem with PyTorch 0.2?

How come all other evaluation tasks (that all use classifier.py) are fine except TREC...does anyone have any suggestions?

2017-08-15 20:52:30,838 : ***** Transfer task : TREC *****

2017-08-15 20:52:35,352 : Found 9548(/9767) words with glove vectors
2017-08-15 20:52:35,352 : Vocab size : 9548
2017-08-15 20:52:36,461 : Computed train embeddings
2017-08-15 20:52:36,577 : Computed test embeddings
2017-08-15 20:52:36,578 : Training pytorch-LogReg with 10-fold cross-validation
2017-08-15 20:55:49,459 : [('reg:1e-05', 80.72), ('reg:0.0001', 80.76), ('reg:0.001', 80.7), ('reg:0.01', 80.39)]
2017-08-15 20:55:49,460 : Cross-validation : best param found is reg = 0.0001 with score 80.76
2017-08-15 20:55:49,460 : Evaluating...
Traceback (most recent call last):
  File "model_run.py", line 156, in <module>
    tf.app.run()
  File "/home/xx/miniconda2/envs/tensorflow/lib/python2.7/site-packages/tensorflow/python/platform/app.py", line 43, in run
    sys.exit(main(sys.argv[:1] + flags_passthrough))
  File "model_run.py", line 150, in main
    results_transfer = se.eval(transfer_tasks)
  File "/home/xx/Documents/SentEval/senteval.py", line 56, in eval
    self.results = {x:self.eval(x) for x in name}
  File "/home/xx/Documents/SentEval/senteval.py", line 56, in <dictcomp>
    self.results = {x:self.eval(x) for x in name}
  File "/home/xx/Documents/SentEval/senteval.py", line 91, in eval
    self.results = self.evaluation.run(self.params, self.batcher)
  File "/home/xx/Documents/SentEval/trec.py", line 76, in run
    devacc, testacc, _ = clf.run()
  File "/home/xx/Documents/SentEval/tools/validation.py", line 159, in run
    yhat = clf.predict(self.test['X'])
  File "/home/xx/Documents/SentEval/tools/classifier.py", line 137, in predict
    yhat = np.append(yhat, output.data.max(1)[1].squeeze(1).cpu().numpy())
RuntimeError: dimension out of range (expected to be in range of [-1, 0], but got 1)
aconneau commented 7 years ago

It's due to a change in the new pytorch.

For pytorch>=0.2, they decided that the .sum(k), .max(k) etc operations would not let an "empty dimension at k" (i.e a dimension k =1). For pytorch<0.2, we needed to do a "squeeze(k), but now the .squeeze(k) has been included in the above operations. That's why it creates a "dimension out of range" because it's trying to do a squeeze(1) on the second dimension, while there's only 1 dimension ("0").

Since we should all move to pytorch 0.2, I'll make these changes (that should only require removing these "squeeze()") asap and change the requirements "Pytorch>=0.2" in the README.

A temporary for you fix would be to remove the "squeeze(1)" here: https://github.com/facebookresearch/SentEval/blob/master/senteval/tools/classifier.py#L144

aconneau commented 7 years ago

Made the modifications in https://github.com/facebookresearch/SentEval/commit/91f82751add3fea2cafc8afc16dc45ef72127850 let me know if that fixed the issue. It should.