LorrinWWW / two-are-better-than-one

Code associated with the paper **Two are Better Than One: Joint Entity and Relation Extraction with Table-Sequence Encoders**, at EMNLP 2020
196 stars 47 forks source link

RuntimeError: index out of range: Tried to access index 512 out of table with 511 rows. at /pytorch/aten/src/TH/generic/THTensorEvenMoreMath.cpp:418 #28

Open jacky0218 opened 2 years ago

jacky0218 commented 2 years ago

I got problem when I run(using my own data)

python gens/gen_bert.py \
    --model albert-xxlarge-v1 \
    --dataset ACE05 \
    --save_attention 1 \
    --save_path ./wv/albert.ace05_with_heads.pkl

error: 98% 50/51 [00:40<00:00, 1.24it/s] Traceback (most recent call last): File "gens/gen_bert.py", line 286, in <module> embedding.embed(s) File "/usr/local/lib/python3.7/dist-packages/flair/embeddings.py", line 96, in embed self._add_embeddings_internal(sentences) File "gens/gen_bert.py", line 138, in _add_embeddings_internal tmp = self.model(all_input_ids, attention_mask=all_input_masks) File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 532, in __call__ result = self.forward(*input, **kwargs) File "/usr/local/lib/python3.7/dist-packages/transformers/modeling_albert.py", line 558, in forward input_ids, position_ids=position_ids, token_type_ids=token_type_ids, inputs_embeds=inputs_embeds File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 532, in __call__ result = self.forward(*input, **kwargs) File "/usr/local/lib/python3.7/dist-packages/transformers/modeling_bert.py", line 174, in forward position_embeddings = self.position_embeddings(position_ids) File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 532, in __call__ result = self.forward(*input, **kwargs) File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/sparse.py", line 114, in forward self.norm_type, self.scale_grad_by_freq, self.sparse) File "/usr/local/lib/python3.7/dist-packages/torch/nn/functional.py", line 1484, in embedding return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse) RuntimeError: index out of range: Tried to access index 512 out of table with 511 rows. at /pytorch/aten/src/TH/generic/THTensorEvenMoreMath.cpp:418

any method to slove it? thx!

LorrinWWW commented 2 years ago

It seems that the sequence length is too long (i.e. # tokens > 512).

if you are ok with limiting the max sequence length, then try truncating the input text (and drop labels if their corresponding tokens are truncated) and see if this still throws an error;

if you attempt to do document-level extraction (i.e. # tokens > 512), then the current repository does not support this, unfortunately. You may need to check other algorithms specifically designed for long-range document-level relation extraction.