Open leizhu1989 opened 1 year ago
去掉cuda 用cpu跑也会报错:
Run EntityRelationExtraction NER ...
STARTING TRAIN...
Epoch: 0
0%| | 0/6158 [00:00<?, ?it/s]
Traceback (most recent call last):
File "mains/trainer_ner.py", line 171, in
求大佬指点,没乱修改其他的啊,就把vocab长度改了,用那个官网上的是21128;json数据加载也没报错
去掉cuda 用cpu跑也会报错: Run EntityRelationExtraction NER ... STARTING TRAIN... Epoch: 0 0%| | 0/6158 [00:00<?, ?it/s] Traceback (most recent call last): File "mains/trainer_ner.py", line 171, in trainer.train() File "mains/trainer_ner.py", line 68, in train loss_ner, f1_ner, pred_ner = self.train_batch(data_item) File "mains/trainer_ner.py", line 93, in train_batch loss_ner, pred_ner = self.model(data_item) File "/home/ctgcdt/anaconda3/envs/nlp_sentSim_gpu/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl result = self.forward(*input, *kwargs) File "/home/ctgcdt/zhulei/NLP/EntityRelationExtraction/modules/model_ner.py", line 80, in forward embeddings = self.word_embedding(data_item['text_tokened'].to(torch.int64)) # 要转化为int64 File "/home/ctgcdt/anaconda3/envs/nlp_sentSim_gpu/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl result = self.forward(input, **kwargs) File "/home/ctgcdt/anaconda3/envs/nlp_sentSim_gpu/lib/python3.7/site-packages/torch/nn/modules/sparse.py", line 126, in forward self.norm_type, self.scale_grad_by_freq, self.sparse) File "/home/ctgcdt/anaconda3/envs/nlp_sentSim_gpu/lib/python3.7/site-packages/torch/nn/functional.py", line 1852, in embedding return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse) IndexError: index out of range in self
求大佬指点,没乱修改其他的啊,就把vocab长度改了,用那个官网上的是21128;json数据加载也没报错
老哥 你这个最后跑成功了吗
您好作者,我的环境中: pytorch:1.7.0+cu110 transformer:4.3.2 pytorch-crf:0.7.2 显卡:tesla V100 训练trainer_ner的时候报错了:
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:658: indexSelectLargeIndex: block: [430,0,0], thread: [64,0,0] Assertion
srcIndex < srcSelectDimSize
failed. /pytorch/aten/src/ATen/native/cuda/Indexing.cu:658: indexSelectLargeIndex: block: [430,0,0], thread: [65,0,0] AssertionsrcIndex < srcSelectDimSize
failed. /pytorch/aten/src/ATen/native/cuda/Indexing.cu:658: indexSelectLargeIndex: block: [430,0,0], thread: [66,0,0] AssertionsrcIndex < srcSelectDimSize
failed. /pytorch/aten/src/ATen/native/cuda/Indexing.cu:658: indexSelectLargeIndex: block: [430,0,0], thread: [67,0,0] AssertionsrcIndex < srcSelectDimSize
failed. /pytorch/aten/src/ATen/native/cuda/Indexing.cu:658: indexSelectLargeIndex: block: [430,0,0], thread: [68,0,0] AssertionsrcIndex < srcSelectDimSize
failed. /pytorch/aten/src/ATen/native/cuda/Indexing.cu:658: indexSelectLargeIndex: block: [430,0,0], thread: [69,0,0] AssertionsrcIndex < srcSelectDimSize
failed. /pytorch/aten/src/ATen/native/cuda/Indexing.cu:658: indexSelectLargeIndex: block: [430,0,0], thread: [70,0,0] AssertionsrcIndex < srcSelectDimSize
failed. /pytorch/aten/src/ATen/native/cuda/Indexing.cu:658: indexSelectLargeIndex: block: [430,0,0], thread: [71,0,0] AssertionsrcIndex < srcSelectDimSize
failed. /pytorch/aten/src/ATen/native/cuda/Indexing.cu:658: indexSelectLargeIndex: block: [430,0,0], thread: [72,0,0] AssertionsrcIndex < srcSelectDimSize
failed. /pytorch/aten/src/ATen/native/cuda/Indexing.cu:658: indexSelectLargeIndex: block: [430,0,0], thread: [73,0,0] AssertionsrcIndex < srcSelectDimSize
failed. /pytorch/aten/src/ATen/native/cuda/Indexing.cu:658: indexSelectLargeIndex: block: [430,0,0], thread: [74,0,0] AssertionsrcIndex < srcSelectDimSize
failed. /pytorch/aten/src/ATen/native/cuda/Indexing.cu:658: indexSelectLargeIndex: block: [430,0,0], thread: [75,0,0] AssertionsrcIndex < srcSelectDimSize
failed. /pytorch/aten/src/ATen/native/cuda/Indexing.cu:658: indexSelectLargeIndex: block: [430,0,0], thread: [76,0,0] AssertionsrcIndex < srcSelectDimSize
failed. /pytorch/aten/src/ATen/native/cuda/Indexing.cu:658: indexSelectLargeIndex: block: [430,0,0], thread: [77,0,0] AssertionsrcIndex < srcSelectDimSize
failed. /pytorch/aten/src/ATen/native/cuda/Indexing.cu:658: indexSelectLargeIndex: block: [430,0,0], thread: [78,0,0] AssertionsrcIndex < srcSelectDimSize
failed. /pytorch/aten/src/ATen/native/cuda/Indexing.cu:658: indexSelectLargeIndex: block: [430,0,0], thread: [79,0,0] AssertionsrcIndex < srcSelectDimSize
failed. /pytorch/aten/src/ATen/native/cuda/Indexing.cu:658: indexSelectLargeIndex: block: [430,0,0], thread: [80,0,0] AssertionsrcIndex < srcSelectDimSize
failed. /pytorch/aten/src/ATen/native/cuda/Indexing.cu:658: indexSelectLargeIndex: block: [430,0,0], thread: [81,0,0] AssertionsrcIndex < srcSelectDimSize
failed.Traceback (most recent call last): File "mains/trainer_ner.py", line 171, in trainer.train() File "mains/trainer_ner.py", line 68, in train loss_ner, f1_ner, pred_ner = self.train_batch(data_item) File "mains/trainer_ner.py", line 93, in train_batch loss_ner, pred_ner = self.model(data_item) File "/home/ctgcdt/anaconda3/envs/nlp_sentSim_gpu/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl result = self.forward(_input, **kwargs) File "/home/ctgcdt/zhulei/NLP/EntityRelationExtraction/modules/model_ner.py", line 86, in forward hidden_init = torch.randn(2_self.num_layers, self.batch_size, self.hidden_dim).cuda() RuntimeError: CUDA error: device-side assert triggered
像是cuda内部核报错的啊,可能是什么原因呢
词表长度不能乱改的
您好作者,我的环境中: pytorch:1.7.0+cu110 transformer:4.3.2 pytorch-crf:0.7.2 显卡:tesla V100 训练trainer_ner的时候报错了:
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:658: indexSelectLargeIndex: block: [430,0,0], thread: [64,0,0] Assertion
srcIndex < srcSelectDimSize
failed. /pytorch/aten/src/ATen/native/cuda/Indexing.cu:658: indexSelectLargeIndex: block: [430,0,0], thread: [65,0,0] AssertionsrcIndex < srcSelectDimSize
failed. /pytorch/aten/src/ATen/native/cuda/Indexing.cu:658: indexSelectLargeIndex: block: [430,0,0], thread: [66,0,0] AssertionsrcIndex < srcSelectDimSize
failed. /pytorch/aten/src/ATen/native/cuda/Indexing.cu:658: indexSelectLargeIndex: block: [430,0,0], thread: [67,0,0] AssertionsrcIndex < srcSelectDimSize
failed. /pytorch/aten/src/ATen/native/cuda/Indexing.cu:658: indexSelectLargeIndex: block: [430,0,0], thread: [68,0,0] AssertionsrcIndex < srcSelectDimSize
failed. /pytorch/aten/src/ATen/native/cuda/Indexing.cu:658: indexSelectLargeIndex: block: [430,0,0], thread: [69,0,0] AssertionsrcIndex < srcSelectDimSize
failed. /pytorch/aten/src/ATen/native/cuda/Indexing.cu:658: indexSelectLargeIndex: block: [430,0,0], thread: [70,0,0] AssertionsrcIndex < srcSelectDimSize
failed. /pytorch/aten/src/ATen/native/cuda/Indexing.cu:658: indexSelectLargeIndex: block: [430,0,0], thread: [71,0,0] AssertionsrcIndex < srcSelectDimSize
failed. /pytorch/aten/src/ATen/native/cuda/Indexing.cu:658: indexSelectLargeIndex: block: [430,0,0], thread: [72,0,0] AssertionsrcIndex < srcSelectDimSize
failed. /pytorch/aten/src/ATen/native/cuda/Indexing.cu:658: indexSelectLargeIndex: block: [430,0,0], thread: [73,0,0] AssertionsrcIndex < srcSelectDimSize
failed. /pytorch/aten/src/ATen/native/cuda/Indexing.cu:658: indexSelectLargeIndex: block: [430,0,0], thread: [74,0,0] AssertionsrcIndex < srcSelectDimSize
failed. /pytorch/aten/src/ATen/native/cuda/Indexing.cu:658: indexSelectLargeIndex: block: [430,0,0], thread: [75,0,0] AssertionsrcIndex < srcSelectDimSize
failed. /pytorch/aten/src/ATen/native/cuda/Indexing.cu:658: indexSelectLargeIndex: block: [430,0,0], thread: [76,0,0] AssertionsrcIndex < srcSelectDimSize
failed. /pytorch/aten/src/ATen/native/cuda/Indexing.cu:658: indexSelectLargeIndex: block: [430,0,0], thread: [77,0,0] AssertionsrcIndex < srcSelectDimSize
failed. /pytorch/aten/src/ATen/native/cuda/Indexing.cu:658: indexSelectLargeIndex: block: [430,0,0], thread: [78,0,0] AssertionsrcIndex < srcSelectDimSize
failed. /pytorch/aten/src/ATen/native/cuda/Indexing.cu:658: indexSelectLargeIndex: block: [430,0,0], thread: [79,0,0] AssertionsrcIndex < srcSelectDimSize
failed. /pytorch/aten/src/ATen/native/cuda/Indexing.cu:658: indexSelectLargeIndex: block: [430,0,0], thread: [80,0,0] AssertionsrcIndex < srcSelectDimSize
failed. /pytorch/aten/src/ATen/native/cuda/Indexing.cu:658: indexSelectLargeIndex: block: [430,0,0], thread: [81,0,0] AssertionsrcIndex < srcSelectDimSize
failed.Traceback (most recent call last): File "mains/trainer_ner.py", line 171, in
trainer.train()
File "mains/trainer_ner.py", line 68, in train
loss_ner, f1_ner, pred_ner = self.train_batch(data_item)
File "mains/trainer_ner.py", line 93, in train_batch
loss_ner, pred_ner = self.model(data_item)
File "/home/ctgcdt/anaconda3/envs/nlp_sentSim_gpu/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, *kwargs)
File "/home/ctgcdt/zhulei/NLP/EntityRelationExtraction/modules/model_ner.py", line 86, in forward
hidden_init = torch.randn(2self.num_layers, self.batch_size, self.hidden_dim).cuda()
RuntimeError: CUDA error: device-side assert triggered
像是cuda内部核报错的啊,可能是什么原因呢