BPYap / BERT-WSD

[EMNLP-Findings 2020] Adapting BERT for Word Sense Disambiguation with Gloss Selection Objective and Example Sentences
https://arxiv.org/abs/2009.11795
62 stars 12 forks source link

Index out of range / CUDA issue #2

Closed ClayGraubard closed 3 years ago

ClayGraubard commented 3 years ago

When trying to run the demo, I've run into three issues.

Firstly, when I run demo_model.py with any pre-trained model (in this case, bert_large-batch_size=128...), I get the following warning:

Some weights of the model checkpoint at DIR were not used when initializing BertWSD: ['similarity_loss_factor', 'ranking_loss_factor', 'similarity_linear.weight', 'similarity_linear.bias']

  • This IS expected if you are initializing BertWSD from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
  • This IS NOT expected if you are initializing BertWSD from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model). Special tokens have been added in the vocabulary, make sure the associated word embedding are fine-tuned or trained.

Then, when I run it with my CPU using the test sentence "He caught a [TGT] bass [TGT] yesterday.", it throws an error:

return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse) IndexError: index out of range in self

Finally, if I run it with my GPU (2080 Ti), it throws a ton of errors reading " Assertion srcIndex < srcSelectDimSize failed." It ends with this error: "RuntimeError: CUDA error: CUBLAS_STATUS_NOT_INITIALIZED when calling cublasCreate(handle)"

Was wondering if you had any fixes to get it working. Thanks a ton!

Edit: Running just "bert-large-uncased" worked. Is there a way to still use the pre-trained models included?

BPYap commented 3 years ago

Hi Clay,

Thanks for your interest in our work!

Unfortunately I wasn't able to reproduce the errors you mentioned. I tested the script on my Windows 10 laptop as well as on a Google Colab notebook; it works without issue regardless of whether CPU or GPU is used. Did you have the same version of pytorch (1.3.1) and transformer (2.3.0) module installed?

Regarding the first warning you saw, the unused weights, i.e. ['similarity_loss_factor', 'ranking_loss_factor', 'similarity_linear.weight', 'similarity_linear.bias'] are historical artifacts from our development code base. We tried different ways of modifying BERT for WSD and those leftover weights are part of our failed experiments. They are not used in the final version of the pretrained models so the warning can be safely ignored :)

ClayGraubard commented 3 years ago

Hey BPYap,

It appears I resolved my issues, and let's just say a whole lot of stupidity (or tiredness) played a role here :)

  1. Forgot to install CUDA, thinking "hey, my RTX drivers should be good enough!"
  2. Using transformers 4.3.0 instead of 2.3.0...LOL
  3. Using pytorch 1.8+. I was unable to find 1.3.1, but 1.4.0 with torchvision 0.5.0 worked

So yeah, it works now. Thank you!