Xie-Minghui / EntityRelationExtraction

实体关系抽取pipline方式,使用了BiLSTM+CRF+BERT
115 stars 14 forks source link

训练报错 #13

Open leizhu1989 opened 1 year ago

leizhu1989 commented 1 year ago

您好作者,我的环境中: pytorch:1.7.0+cu110 transformer:4.3.2 pytorch-crf:0.7.2 显卡:tesla V100 训练trainer_ner的时候报错了:

/pytorch/aten/src/ATen/native/cuda/Indexing.cu:658: indexSelectLargeIndex: block: [430,0,0], thread: [64,0,0] Assertion srcIndex < srcSelectDimSize failed. /pytorch/aten/src/ATen/native/cuda/Indexing.cu:658: indexSelectLargeIndex: block: [430,0,0], thread: [65,0,0] Assertion srcIndex < srcSelectDimSize failed. /pytorch/aten/src/ATen/native/cuda/Indexing.cu:658: indexSelectLargeIndex: block: [430,0,0], thread: [66,0,0] Assertion srcIndex < srcSelectDimSize failed. /pytorch/aten/src/ATen/native/cuda/Indexing.cu:658: indexSelectLargeIndex: block: [430,0,0], thread: [67,0,0] Assertion srcIndex < srcSelectDimSize failed. /pytorch/aten/src/ATen/native/cuda/Indexing.cu:658: indexSelectLargeIndex: block: [430,0,0], thread: [68,0,0] Assertion srcIndex < srcSelectDimSize failed. /pytorch/aten/src/ATen/native/cuda/Indexing.cu:658: indexSelectLargeIndex: block: [430,0,0], thread: [69,0,0] Assertion srcIndex < srcSelectDimSize failed. /pytorch/aten/src/ATen/native/cuda/Indexing.cu:658: indexSelectLargeIndex: block: [430,0,0], thread: [70,0,0] Assertion srcIndex < srcSelectDimSize failed. /pytorch/aten/src/ATen/native/cuda/Indexing.cu:658: indexSelectLargeIndex: block: [430,0,0], thread: [71,0,0] Assertion srcIndex < srcSelectDimSize failed. /pytorch/aten/src/ATen/native/cuda/Indexing.cu:658: indexSelectLargeIndex: block: [430,0,0], thread: [72,0,0] Assertion srcIndex < srcSelectDimSize failed. /pytorch/aten/src/ATen/native/cuda/Indexing.cu:658: indexSelectLargeIndex: block: [430,0,0], thread: [73,0,0] Assertion srcIndex < srcSelectDimSize failed. /pytorch/aten/src/ATen/native/cuda/Indexing.cu:658: indexSelectLargeIndex: block: [430,0,0], thread: [74,0,0] Assertion srcIndex < srcSelectDimSize failed. /pytorch/aten/src/ATen/native/cuda/Indexing.cu:658: indexSelectLargeIndex: block: [430,0,0], thread: [75,0,0] Assertion srcIndex < srcSelectDimSize failed. /pytorch/aten/src/ATen/native/cuda/Indexing.cu:658: indexSelectLargeIndex: block: [430,0,0], thread: [76,0,0] Assertion srcIndex < srcSelectDimSize failed. /pytorch/aten/src/ATen/native/cuda/Indexing.cu:658: indexSelectLargeIndex: block: [430,0,0], thread: [77,0,0] Assertion srcIndex < srcSelectDimSize failed. /pytorch/aten/src/ATen/native/cuda/Indexing.cu:658: indexSelectLargeIndex: block: [430,0,0], thread: [78,0,0] Assertion srcIndex < srcSelectDimSize failed. /pytorch/aten/src/ATen/native/cuda/Indexing.cu:658: indexSelectLargeIndex: block: [430,0,0], thread: [79,0,0] Assertion srcIndex < srcSelectDimSize failed. /pytorch/aten/src/ATen/native/cuda/Indexing.cu:658: indexSelectLargeIndex: block: [430,0,0], thread: [80,0,0] Assertion srcIndex < srcSelectDimSize failed. /pytorch/aten/src/ATen/native/cuda/Indexing.cu:658: indexSelectLargeIndex: block: [430,0,0], thread: [81,0,0] Assertion srcIndex < srcSelectDimSize failed.

Traceback (most recent call last): File "mains/trainer_ner.py", line 171, in trainer.train() File "mains/trainer_ner.py", line 68, in train loss_ner, f1_ner, pred_ner = self.train_batch(data_item) File "mains/trainer_ner.py", line 93, in train_batch loss_ner, pred_ner = self.model(data_item) File "/home/ctgcdt/anaconda3/envs/nlp_sentSim_gpu/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl result = self.forward(*input, *kwargs) File "/home/ctgcdt/zhulei/NLP/EntityRelationExtraction/modules/model_ner.py", line 86, in forward hidden_init = torch.randn(2self.num_layers, self.batch_size, self.hidden_dim).cuda() RuntimeError: CUDA error: device-side assert triggered

像是cuda内部核报错的啊,可能是什么原因呢

leizhu1989 commented 1 year ago

去掉cuda 用cpu跑也会报错: Run EntityRelationExtraction NER ... STARTING TRAIN... Epoch: 0 0%| | 0/6158 [00:00<?, ?it/s] Traceback (most recent call last): File "mains/trainer_ner.py", line 171, in trainer.train() File "mains/trainer_ner.py", line 68, in train loss_ner, f1_ner, pred_ner = self.train_batch(data_item) File "mains/trainer_ner.py", line 93, in train_batch loss_ner, pred_ner = self.model(data_item) File "/home/ctgcdt/anaconda3/envs/nlp_sentSim_gpu/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl result = self.forward(*input, *kwargs) File "/home/ctgcdt/zhulei/NLP/EntityRelationExtraction/modules/model_ner.py", line 80, in forward embeddings = self.word_embedding(data_item['text_tokened'].to(torch.int64)) # 要转化为int64 File "/home/ctgcdt/anaconda3/envs/nlp_sentSim_gpu/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl result = self.forward(input, **kwargs) File "/home/ctgcdt/anaconda3/envs/nlp_sentSim_gpu/lib/python3.7/site-packages/torch/nn/modules/sparse.py", line 126, in forward self.norm_type, self.scale_grad_by_freq, self.sparse) File "/home/ctgcdt/anaconda3/envs/nlp_sentSim_gpu/lib/python3.7/site-packages/torch/nn/functional.py", line 1852, in embedding return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse) IndexError: index out of range in self

求大佬指点,没乱修改其他的啊,就把vocab长度改了,用那个官网上的是21128;json数据加载也没报错

abbhay commented 10 months ago

去掉cuda 用cpu跑也会报错: Run EntityRelationExtraction NER ... STARTING TRAIN... Epoch: 0 0%| | 0/6158 [00:00<?, ?it/s] Traceback (most recent call last): File "mains/trainer_ner.py", line 171, in trainer.train() File "mains/trainer_ner.py", line 68, in train loss_ner, f1_ner, pred_ner = self.train_batch(data_item) File "mains/trainer_ner.py", line 93, in train_batch loss_ner, pred_ner = self.model(data_item) File "/home/ctgcdt/anaconda3/envs/nlp_sentSim_gpu/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl result = self.forward(*input, *kwargs) File "/home/ctgcdt/zhulei/NLP/EntityRelationExtraction/modules/model_ner.py", line 80, in forward embeddings = self.word_embedding(data_item['text_tokened'].to(torch.int64)) # 要转化为int64 File "/home/ctgcdt/anaconda3/envs/nlp_sentSim_gpu/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl result = self.forward(input, **kwargs) File "/home/ctgcdt/anaconda3/envs/nlp_sentSim_gpu/lib/python3.7/site-packages/torch/nn/modules/sparse.py", line 126, in forward self.norm_type, self.scale_grad_by_freq, self.sparse) File "/home/ctgcdt/anaconda3/envs/nlp_sentSim_gpu/lib/python3.7/site-packages/torch/nn/functional.py", line 1852, in embedding return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse) IndexError: index out of range in self

求大佬指点,没乱修改其他的啊,就把vocab长度改了,用那个官网上的是21128;json数据加载也没报错

老哥 你这个最后跑成功了吗

Xie-Minghui commented 10 months ago

您好作者,我的环境中: pytorch:1.7.0+cu110 transformer:4.3.2 pytorch-crf:0.7.2 显卡:tesla V100 训练trainer_ner的时候报错了:

/pytorch/aten/src/ATen/native/cuda/Indexing.cu:658: indexSelectLargeIndex: block: [430,0,0], thread: [64,0,0] Assertion srcIndex < srcSelectDimSize failed. /pytorch/aten/src/ATen/native/cuda/Indexing.cu:658: indexSelectLargeIndex: block: [430,0,0], thread: [65,0,0] Assertion srcIndex < srcSelectDimSize failed. /pytorch/aten/src/ATen/native/cuda/Indexing.cu:658: indexSelectLargeIndex: block: [430,0,0], thread: [66,0,0] Assertion srcIndex < srcSelectDimSize failed. /pytorch/aten/src/ATen/native/cuda/Indexing.cu:658: indexSelectLargeIndex: block: [430,0,0], thread: [67,0,0] Assertion srcIndex < srcSelectDimSize failed. /pytorch/aten/src/ATen/native/cuda/Indexing.cu:658: indexSelectLargeIndex: block: [430,0,0], thread: [68,0,0] Assertion srcIndex < srcSelectDimSize failed. /pytorch/aten/src/ATen/native/cuda/Indexing.cu:658: indexSelectLargeIndex: block: [430,0,0], thread: [69,0,0] Assertion srcIndex < srcSelectDimSize failed. /pytorch/aten/src/ATen/native/cuda/Indexing.cu:658: indexSelectLargeIndex: block: [430,0,0], thread: [70,0,0] Assertion srcIndex < srcSelectDimSize failed. /pytorch/aten/src/ATen/native/cuda/Indexing.cu:658: indexSelectLargeIndex: block: [430,0,0], thread: [71,0,0] Assertion srcIndex < srcSelectDimSize failed. /pytorch/aten/src/ATen/native/cuda/Indexing.cu:658: indexSelectLargeIndex: block: [430,0,0], thread: [72,0,0] Assertion srcIndex < srcSelectDimSize failed. /pytorch/aten/src/ATen/native/cuda/Indexing.cu:658: indexSelectLargeIndex: block: [430,0,0], thread: [73,0,0] Assertion srcIndex < srcSelectDimSize failed. /pytorch/aten/src/ATen/native/cuda/Indexing.cu:658: indexSelectLargeIndex: block: [430,0,0], thread: [74,0,0] Assertion srcIndex < srcSelectDimSize failed. /pytorch/aten/src/ATen/native/cuda/Indexing.cu:658: indexSelectLargeIndex: block: [430,0,0], thread: [75,0,0] Assertion srcIndex < srcSelectDimSize failed. /pytorch/aten/src/ATen/native/cuda/Indexing.cu:658: indexSelectLargeIndex: block: [430,0,0], thread: [76,0,0] Assertion srcIndex < srcSelectDimSize failed. /pytorch/aten/src/ATen/native/cuda/Indexing.cu:658: indexSelectLargeIndex: block: [430,0,0], thread: [77,0,0] Assertion srcIndex < srcSelectDimSize failed. /pytorch/aten/src/ATen/native/cuda/Indexing.cu:658: indexSelectLargeIndex: block: [430,0,0], thread: [78,0,0] Assertion srcIndex < srcSelectDimSize failed. /pytorch/aten/src/ATen/native/cuda/Indexing.cu:658: indexSelectLargeIndex: block: [430,0,0], thread: [79,0,0] Assertion srcIndex < srcSelectDimSize failed. /pytorch/aten/src/ATen/native/cuda/Indexing.cu:658: indexSelectLargeIndex: block: [430,0,0], thread: [80,0,0] Assertion srcIndex < srcSelectDimSize failed. /pytorch/aten/src/ATen/native/cuda/Indexing.cu:658: indexSelectLargeIndex: block: [430,0,0], thread: [81,0,0] Assertion srcIndex < srcSelectDimSize failed.

Traceback (most recent call last): File "mains/trainer_ner.py", line 171, in trainer.train() File "mains/trainer_ner.py", line 68, in train loss_ner, f1_ner, pred_ner = self.train_batch(data_item) File "mains/trainer_ner.py", line 93, in train_batch loss_ner, pred_ner = self.model(data_item) File "/home/ctgcdt/anaconda3/envs/nlp_sentSim_gpu/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl result = self.forward(_input, **kwargs) File "/home/ctgcdt/zhulei/NLP/EntityRelationExtraction/modules/model_ner.py", line 86, in forward hidden_init = torch.randn(2_self.num_layers, self.batch_size, self.hidden_dim).cuda() RuntimeError: CUDA error: device-side assert triggered

像是cuda内部核报错的啊,可能是什么原因呢

词表长度不能乱改的