OpenNMT / OpenNMT-py

Open Source Neural Machine Translation and (Large) Language Models in PyTorch
https://opennmt.net/
MIT License
6.75k stars 2.25k forks source link

i got RuntimeError: CUDNN_STATUS_INTERNAL_ERROR, #454

Closed zhang-wen closed 6 years ago

zhang-wen commented 6 years ago

/opt/conda/conda-bld/pytorch_1512378360668/work/torch/lib/THC/THCTensorIndex.cu:325: void indexSelectLargeIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2]: block: [94,0,0], thread: [29,0,0] Assertion srcIndex < srcSelectDimSize failed. /opt/conda/conda-bld/pytorch_1512378360668/work/torch/lib/THC/THCTensorIndex.cu:325: void indexSelectLargeIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2]: block: [94,0,0], thread: [30,0,0] Assertion srcIndex < srcSelectDimSize failed. /opt/conda/conda-bld/pytorch_1512378360668/work/torch/lib/THC/THCTensorIndex.cu:325: void indexSelectLargeIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2]: block: [94,0,0], thread: [31,0,0] Assertion srcIndex < srcSelectDimSize failed. Traceback (most recent call last): File '_main.py', line 202, in main() File '_main.py', line 197, in main trainer.train() File '/home/wen/1.research/zh-en/iwslt/with_transformer/trainer.py', line 126, in train outputs = self.model(src, trg) File '/home/wen/anaconda2/lib/python2.7/site-packages/torch/nn/modules/module.py', line 325, in call result = self.forward(*input, kwargs) File '/home/wen/1.research/zh-en/iwslt/with_transformer/models/transformer.py', line 65, in forward enc_output, enc_slf_attn = self.encoder(src_seq, src_pos) File '/home/wen/anaconda2/lib/python2.7/site-packages/torch/nn/modules/module.py', line 325, in call result = self.forward(*input, *kwargs) File '/home/wen/1.research/zh-en/iwslt/with_transformer/models/transformer.py', line 244, in forward enc_out, enc_slf_attn = enc_layer(enc_out, src_slf_attn_mask) File '/home/wen/anaconda2/lib/python2.7/site-packages/torch/nn/modules/module.py', line 325, in call result = self.forward(input, kwargs) File '/home/wen/1.research/zh-en/iwslt/with_transformer/models/transformer.py', line 205, in forward enc_output = self.pos_ffn(enc_output) File '/home/wen/anaconda2/lib/python2.7/site-packages/torch/nn/modules/module.py', line 325, in call result = self.forward(*input, *kwargs) File '/home/wen/1.research/zh-en/iwslt/with_transformer/models/transformer.py', line 181, in forward output = self.dropout(self.w_2(self.relu(self.w_1(x)))) File '/home/wen/anaconda2/lib/python2.7/site-packages/torch/nn/modules/module.py', line 325, in call result = self.forward(input, **kwargs) File '/home/wen/anaconda2/lib/python2.7/site-packages/torch/nn/modules/conv.py', line 166, in forward self.padding, self.dilation, self.groups) File '/home/wen/anaconda2/lib/python2.7/site-packages/torch/nn/functional.py', line 54, in conv1d return f(input, weight, bias) RuntimeError: CUDNN_STATUS_INTERNAL_ERROR

someone got this error ? i do not know why this happen during training ..

JianyuZhan commented 6 years ago

You might consider updating your cuda driver:https://developer.nvidia.com/cudnn

zhang-wen commented 6 years ago

@JianyuZhan thank you, i have solved my problem.

sammichenVV commented 1 year ago

Hello,I had the same problem,how to solve?