Closed liuyan20062010 closed 2 years ago
Running training Num examples = 2582 Num Epochs = 10 Instantaneous batch size per device = 16 Total train batch size (w. parallel, distributed & accumulation) = 16 Gradient Accumulation steps = 1 Total optimization steps = 1620 0%| | 0/1620 [00:00<?, ?it/s]/opt/conda/conda-bld/pytorch_1595629427478/work/aten/src/THC/THCTensorIndex.cu:272: indexSelectLargeIndex: block: [171,0,0], thread: [32,0,0] Assertion srcIndex < srcSelectDimSize failed. /opt/conda/conda-bld/pytorch_1595629427478/work/aten/src/THC/THCTensorIndex.cu:272: indexSelectLargeIndex: block: [171,0,0], thread: [33,0,0] Assertion srcIndex < srcSelectDimSize failed. /opt/conda/conda-bld/pytorch_1595629427478/work/aten/src/THC/THCTensorIndex.cu:272: indexSelectLargeIndex: block: [171,0,0], thread: [34,0,0] Assertion srcIndex < srcSelectDimSize failed. /opt/conda/conda-bld/pytorch_1595629427478/work/aten/src/THC/THCTensorIndex.cu:272: indexSelectLargeIndex: block: [171,0,0], thread: [35,0,0] Assertion srcIndex < srcSelectDimSize failed. /opt/conda/conda-bld/pytorch_1595629427478/work/aten/src/THC/THCTensorIndex.cu:272: indexSelectLargeIndex: block: [171,0,0], thread: [36,0,0] Assertion srcIndex < srcSelectDimSize failed. /opt/conda/conda-bld/pytorch_1595629427478/work/aten/src/THC/THCTensorIndex.cu:272: indexSelectLargeIndex: block: [171,0,0], thread: [37,0,0] Assertion srcIndex < srcSelectDimSize failed. /opt/conda/conda-bld/pytorch_1595629427478/work/aten/src/THC/THCTensorIndex.cu:272: indexSelectLargeIndex: block: [171,0,0], thread: [38,0,0] Assertion srcIndex < srcSelectDimSize failed. /opt/conda/conda-bld/pytorch_1595629427478/work/aten/src/THC/THCTensorIndex.cu:272: indexSelectLargeIndex: block: [171,0,0], thread: [39,0,0] Assertion srcIndex < srcSelectDimSize failed. /opt/conda/conda-bld/pytorch_1595629427478/work/aten/src/THC/THCTensorIndex.cu:272: indexSelectLargeIndex: block: [171,0,0], thread: [40,0,0] Assertion srcIndex < srcSelectDimSize failed. /opt/conda/conda-bld/pytorch_1595629427478/work/aten/src/THC/THCTensorIndex.cu:272: indexSelectLargeIndex: block: [171,0,0], thread: [41,0,0] Assertion srcIndex < srcSelectDimSize failed. /opt/conda/conda-bld/pytorch_1595629427478/work/aten/src/THC/THCTensorIndex.cu:272: indexSelectLargeIndex: block: [171,0,0], thread: [42,0,0] Assertion srcIndex < srcSelectDimSize failed. /opt/conda/conda-bld/pytorch_1595629427478/work/aten/src/THC/THCTensorIndex.cu:272: indexSelectLargeIndex: block: [171,0,0], thread: [43,0,0] Assertion srcIndex < srcSelectDimSize failed. /opt/conda/conda-bld/pytorch_1595629427478/work/aten/src/THC/THCTensorIndex.cu:272: indexSelectLargeIndex: block: [171,0,0], thread: [44,0,0] Assertion srcIndex < srcSelectDimSize failed. /opt/conda/conda-bld/pytorch_1595629427478/work/aten/src/THC/THCTensorIndex.cu:272: indexSelectLargeIndex: block: [171,0,0], thread: [45,0,0] Assertion srcIndex < srcSelectDimSize failed. /opt/conda/conda-bld/pytorch_1595629427478/work/aten/src/THC/THCTensorIndex.cu:272: indexSelectLargeIndex: block: [171,0,0], thread: [46,0,0] Assertion srcIndex < srcSelectDimSize failed. /opt/conda/conda-bld/pytorch_1595629427478/work/aten/src/THC/THCTensorIndex.cu:272: indexSelectLargeIndex: block: [171,0,0], thread: [47,0,0] Assertion srcIndex < srcSelectDimSize failed. /opt/conda/conda-bld/pytorch_1595629427478/work/aten/src/THC/THCTensorIndex.cu:272: indexSelectLargeIndex: block: [171,0,0], thread: [48,0,0] Assertion srcIndex < srcSelectDimSize failed. /opt/conda/conda-bld/pytorch_1595629427478/work/aten/src/THC/THCTensorIndex.cu:272: indexSelectLargeIndex: block: [171,0,0], thread: [49,0,0] Assertion srcIndex < srcSelectDimSize failed. /opt/conda/conda-bld/pytorch_1595629427478/work/aten/src/THC/THCTensorIndex.cu:272: indexSelectLargeIndex: block: [171,0,0], thread: [50,0,0] Assertion srcIndex < srcSelectDimSize failed. /opt/conda/conda-bld/pytorch_1595629427478/work/aten/src/THC/THCTensorIndex.cu:272: indexSelectLargeIndex: block: [171,0,0], thread: [51,0,0] Assertion srcIndex < srcSelectDimSize failed. /opt/conda/conda-bld/pytorch_1595629427478/work/aten/src/THC/THCTensorIndex.cu:272: indexSelectLargeIndex: block: [171,0,0], thread: [52,0,0] Assertion srcIndex < srcSelectDimSize failed. /opt/conda/conda-bld/pytorch_1595629427478/work/aten/src/THC/THCTensorIndex.cu:272: indexSelectLargeIndex: block: [171,0,0], thread: [53,0,0] Assertion srcIndex < srcSelectDimSize failed. /opt/conda/conda-bld/pytorch_1595629427478/work/aten/src/THC/THCTensorIndex.cu:272: indexSelectLargeIndex: block: [171,0,0], thread: [54,0,0] Assertion srcIndex < srcSelectDimSize failed. /opt/conda/conda-bld/pytorch_1595629427478/work/aten/src/THC/THCTensorIndex.cu:272: indexSelectLargeIndex: block: [171,0,0], thread: [55,0,0] Assertion srcIndex < srcSelectDimSize failed. /opt/conda/conda-bld/pytorch_1595629427478/work/aten/src/THC/THCTensorIndex.cu:272: indexSelectLargeIndex: block: [171,0,0], thread: [56,0,0] Assertion srcIndex < srcSelectDimSize failed. /opt/conda/conda-bld/pytorch_1595629427478/work/aten/src/THC/THCTensorIndex.cu:272: indexSelectLargeIndex: block: [171,0,0], thread: [57,0,0] Assertion srcIndex < srcSelectDimSize failed. /opt/conda/conda-bld/pytorch_1595629427478/work/aten/src/THC/THCTensorIndex.cu:272: indexSelectLargeIndex: block: [171,0,0], thread: [58,0,0] Assertion srcIndex < srcSelectDimSize failed. /opt/conda/conda-bld/pytorch_1595629427478/work/aten/src/THC/THCTensorIndex.cu:272: indexSelectLargeIndex: block: [171,0,0], thread: [59,0,0] Assertion srcIndex < srcSelectDimSize failed. /opt/conda/conda-bld/pytorch_1595629427478/work/aten/src/THC/THCTensorIndex.cu:272: indexSelectLargeIndex: block: [171,0,0], thread: [60,0,0] Assertion srcIndex < srcSelectDimSize failed. /opt/conda/conda-bld/pytorch_1595629427478/work/aten/src/THC/THCTensorIndex.cu:272: indexSelectLargeIndex: block: [171,0,0], thread: [61,0,0] Assertion srcIndex < srcSelectDimSize failed. /opt/conda/conda-bld/pytorch_1595629427478/work/aten/src/THC/THCTensorIndex.cu:272: indexSelectLargeIndex: block: [171,0,0], thread: [62,0,0] Assertion srcIndex < srcSelectDimSize failed. /opt/conda/conda-bld/pytorch_1595629427478/work/aten/src/THC/THCTensorIndex.cu:272: indexSelectLargeIndex: block: [171,0,0], thread: [63,0,0] Assertion srcIndex < srcSelectDimSize failed. Traceback (most recent call last): File "train.py", line 107, in trainer.train() File "/root/miniconda3/envs/torch1.2_yan/lib/python3.7/site-packages/transformers/trainer.py", line 1422, in train tr_loss_step = self.training_step(model, inputs) File "/root/miniconda3/envs/torch1.2_yan/lib/python3.7/site-packages/transformers/trainer.py", line 2011, in training_step loss = self.compute_loss(model, inputs) File "/root/miniconda3/envs/torch1.2_yan/lib/python3.7/site-packages/transformers/trainer.py", line 2043, in compute_loss outputs = model(inputs) File "/root/miniconda3/envs/torch1.2_yan/lib/python3.7/site-packages/torch/nn/modules/module.py", line 722, in _call_impl result = self.forward(input, kwargs) File "/root/miniconda3/envs/torch1.2_yan/lib/python3.7/site-packages/transformers/models/vision_encoder_decoder/modeling_vision_encoder_decoder.py", line 499, in forward kwargs_decoder, File "/root/miniconda3/envs/torch1.2_yan/lib/python3.7/site-packages/torch/nn/modules/module.py", line 722, in _call_impl result = self.forward(input, kwargs) File "/root/miniconda3/envs/torch1.2_yan/lib/python3.7/site-packages/transformers/models/trocr/modeling_trocr.py", line 958, in forward return_dict=return_dict, File "/root/miniconda3/envs/torch1.2_yan/lib/python3.7/site-packages/torch/nn/modules/module.py", line 722, in _call_impl result = self.forward(*input, **kwargs) File "/root/miniconda3/envs/torch1.2_yan/lib/python3.7/site-packages/transformers/models/trocr/modeling_trocr.py", line 651, in forward attention_mask, input_shape, inputs_embeds, past_key_values_length File "/root/miniconda3/envs/torch1.2_yan/lib/python3.7/site-packages/transformers/models/trocr/modeling_trocr.py", line 523, in _prepare_decoder_attention_mask ).to(self.device) RuntimeError: CUDA error: device-side assert triggered 0%| | 0/1620 [00:01<?, ?it/s]
srcIndex < srcSelectDimSize
不知道大家有没有遇到这么问题,想请教怎么解决
大佬,你那个问题解决了吗 @liuyan20062010
你好,解决了吗,我当时是transform版本的问题,降到4.15就好了
Running training Num examples = 2582 Num Epochs = 10 Instantaneous batch size per device = 16 Total train batch size (w. parallel, distributed & accumulation) = 16 Gradient Accumulation steps = 1 Total optimization steps = 1620 0%| | 0/1620 [00:00<?, ?it/s]/opt/conda/conda-bld/pytorch_1595629427478/work/aten/src/THC/THCTensorIndex.cu:272: indexSelectLargeIndex: block: [171,0,0], thread: [32,0,0] Assertion
trainer.train()
File "/root/miniconda3/envs/torch1.2_yan/lib/python3.7/site-packages/transformers/trainer.py", line 1422, in train
tr_loss_step = self.training_step(model, inputs)
File "/root/miniconda3/envs/torch1.2_yan/lib/python3.7/site-packages/transformers/trainer.py", line 2011, in training_step
loss = self.compute_loss(model, inputs)
File "/root/miniconda3/envs/torch1.2_yan/lib/python3.7/site-packages/transformers/trainer.py", line 2043, in compute_loss
outputs = model(inputs)
File "/root/miniconda3/envs/torch1.2_yan/lib/python3.7/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
result = self.forward(input, kwargs)
File "/root/miniconda3/envs/torch1.2_yan/lib/python3.7/site-packages/transformers/models/vision_encoder_decoder/modeling_vision_encoder_decoder.py", line 499, in forward
kwargs_decoder,
File "/root/miniconda3/envs/torch1.2_yan/lib/python3.7/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
result = self.forward(input, kwargs)
File "/root/miniconda3/envs/torch1.2_yan/lib/python3.7/site-packages/transformers/models/trocr/modeling_trocr.py", line 958, in forward
return_dict=return_dict,
File "/root/miniconda3/envs/torch1.2_yan/lib/python3.7/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
result = self.forward(*input, **kwargs)
File "/root/miniconda3/envs/torch1.2_yan/lib/python3.7/site-packages/transformers/models/trocr/modeling_trocr.py", line 651, in forward
attention_mask, input_shape, inputs_embeds, past_key_values_length
File "/root/miniconda3/envs/torch1.2_yan/lib/python3.7/site-packages/transformers/models/trocr/modeling_trocr.py", line 523, in _prepare_decoder_attention_mask
).to(self.device)
RuntimeError: CUDA error: device-side assert triggered
0%| | 0/1620 [00:01<?, ?it/s]
srcIndex < srcSelectDimSize
failed. /opt/conda/conda-bld/pytorch_1595629427478/work/aten/src/THC/THCTensorIndex.cu:272: indexSelectLargeIndex: block: [171,0,0], thread: [33,0,0] AssertionsrcIndex < srcSelectDimSize
failed. /opt/conda/conda-bld/pytorch_1595629427478/work/aten/src/THC/THCTensorIndex.cu:272: indexSelectLargeIndex: block: [171,0,0], thread: [34,0,0] AssertionsrcIndex < srcSelectDimSize
failed. /opt/conda/conda-bld/pytorch_1595629427478/work/aten/src/THC/THCTensorIndex.cu:272: indexSelectLargeIndex: block: [171,0,0], thread: [35,0,0] AssertionsrcIndex < srcSelectDimSize
failed. /opt/conda/conda-bld/pytorch_1595629427478/work/aten/src/THC/THCTensorIndex.cu:272: indexSelectLargeIndex: block: [171,0,0], thread: [36,0,0] AssertionsrcIndex < srcSelectDimSize
failed. /opt/conda/conda-bld/pytorch_1595629427478/work/aten/src/THC/THCTensorIndex.cu:272: indexSelectLargeIndex: block: [171,0,0], thread: [37,0,0] AssertionsrcIndex < srcSelectDimSize
failed. /opt/conda/conda-bld/pytorch_1595629427478/work/aten/src/THC/THCTensorIndex.cu:272: indexSelectLargeIndex: block: [171,0,0], thread: [38,0,0] AssertionsrcIndex < srcSelectDimSize
failed. /opt/conda/conda-bld/pytorch_1595629427478/work/aten/src/THC/THCTensorIndex.cu:272: indexSelectLargeIndex: block: [171,0,0], thread: [39,0,0] AssertionsrcIndex < srcSelectDimSize
failed. /opt/conda/conda-bld/pytorch_1595629427478/work/aten/src/THC/THCTensorIndex.cu:272: indexSelectLargeIndex: block: [171,0,0], thread: [40,0,0] AssertionsrcIndex < srcSelectDimSize
failed. /opt/conda/conda-bld/pytorch_1595629427478/work/aten/src/THC/THCTensorIndex.cu:272: indexSelectLargeIndex: block: [171,0,0], thread: [41,0,0] AssertionsrcIndex < srcSelectDimSize
failed. /opt/conda/conda-bld/pytorch_1595629427478/work/aten/src/THC/THCTensorIndex.cu:272: indexSelectLargeIndex: block: [171,0,0], thread: [42,0,0] AssertionsrcIndex < srcSelectDimSize
failed. /opt/conda/conda-bld/pytorch_1595629427478/work/aten/src/THC/THCTensorIndex.cu:272: indexSelectLargeIndex: block: [171,0,0], thread: [43,0,0] AssertionsrcIndex < srcSelectDimSize
failed. /opt/conda/conda-bld/pytorch_1595629427478/work/aten/src/THC/THCTensorIndex.cu:272: indexSelectLargeIndex: block: [171,0,0], thread: [44,0,0] AssertionsrcIndex < srcSelectDimSize
failed. /opt/conda/conda-bld/pytorch_1595629427478/work/aten/src/THC/THCTensorIndex.cu:272: indexSelectLargeIndex: block: [171,0,0], thread: [45,0,0] AssertionsrcIndex < srcSelectDimSize
failed. /opt/conda/conda-bld/pytorch_1595629427478/work/aten/src/THC/THCTensorIndex.cu:272: indexSelectLargeIndex: block: [171,0,0], thread: [46,0,0] AssertionsrcIndex < srcSelectDimSize
failed. /opt/conda/conda-bld/pytorch_1595629427478/work/aten/src/THC/THCTensorIndex.cu:272: indexSelectLargeIndex: block: [171,0,0], thread: [47,0,0] AssertionsrcIndex < srcSelectDimSize
failed. /opt/conda/conda-bld/pytorch_1595629427478/work/aten/src/THC/THCTensorIndex.cu:272: indexSelectLargeIndex: block: [171,0,0], thread: [48,0,0] AssertionsrcIndex < srcSelectDimSize
failed. /opt/conda/conda-bld/pytorch_1595629427478/work/aten/src/THC/THCTensorIndex.cu:272: indexSelectLargeIndex: block: [171,0,0], thread: [49,0,0] AssertionsrcIndex < srcSelectDimSize
failed. /opt/conda/conda-bld/pytorch_1595629427478/work/aten/src/THC/THCTensorIndex.cu:272: indexSelectLargeIndex: block: [171,0,0], thread: [50,0,0] AssertionsrcIndex < srcSelectDimSize
failed. /opt/conda/conda-bld/pytorch_1595629427478/work/aten/src/THC/THCTensorIndex.cu:272: indexSelectLargeIndex: block: [171,0,0], thread: [51,0,0] AssertionsrcIndex < srcSelectDimSize
failed. /opt/conda/conda-bld/pytorch_1595629427478/work/aten/src/THC/THCTensorIndex.cu:272: indexSelectLargeIndex: block: [171,0,0], thread: [52,0,0] AssertionsrcIndex < srcSelectDimSize
failed. /opt/conda/conda-bld/pytorch_1595629427478/work/aten/src/THC/THCTensorIndex.cu:272: indexSelectLargeIndex: block: [171,0,0], thread: [53,0,0] AssertionsrcIndex < srcSelectDimSize
failed. /opt/conda/conda-bld/pytorch_1595629427478/work/aten/src/THC/THCTensorIndex.cu:272: indexSelectLargeIndex: block: [171,0,0], thread: [54,0,0] AssertionsrcIndex < srcSelectDimSize
failed. /opt/conda/conda-bld/pytorch_1595629427478/work/aten/src/THC/THCTensorIndex.cu:272: indexSelectLargeIndex: block: [171,0,0], thread: [55,0,0] AssertionsrcIndex < srcSelectDimSize
failed. /opt/conda/conda-bld/pytorch_1595629427478/work/aten/src/THC/THCTensorIndex.cu:272: indexSelectLargeIndex: block: [171,0,0], thread: [56,0,0] AssertionsrcIndex < srcSelectDimSize
failed. /opt/conda/conda-bld/pytorch_1595629427478/work/aten/src/THC/THCTensorIndex.cu:272: indexSelectLargeIndex: block: [171,0,0], thread: [57,0,0] AssertionsrcIndex < srcSelectDimSize
failed. /opt/conda/conda-bld/pytorch_1595629427478/work/aten/src/THC/THCTensorIndex.cu:272: indexSelectLargeIndex: block: [171,0,0], thread: [58,0,0] AssertionsrcIndex < srcSelectDimSize
failed. /opt/conda/conda-bld/pytorch_1595629427478/work/aten/src/THC/THCTensorIndex.cu:272: indexSelectLargeIndex: block: [171,0,0], thread: [59,0,0] AssertionsrcIndex < srcSelectDimSize
failed. /opt/conda/conda-bld/pytorch_1595629427478/work/aten/src/THC/THCTensorIndex.cu:272: indexSelectLargeIndex: block: [171,0,0], thread: [60,0,0] AssertionsrcIndex < srcSelectDimSize
failed. /opt/conda/conda-bld/pytorch_1595629427478/work/aten/src/THC/THCTensorIndex.cu:272: indexSelectLargeIndex: block: [171,0,0], thread: [61,0,0] AssertionsrcIndex < srcSelectDimSize
failed. /opt/conda/conda-bld/pytorch_1595629427478/work/aten/src/THC/THCTensorIndex.cu:272: indexSelectLargeIndex: block: [171,0,0], thread: [62,0,0] AssertionsrcIndex < srcSelectDimSize
failed. /opt/conda/conda-bld/pytorch_1595629427478/work/aten/src/THC/THCTensorIndex.cu:272: indexSelectLargeIndex: block: [171,0,0], thread: [63,0,0] AssertionsrcIndex < srcSelectDimSize
failed. Traceback (most recent call last): File "train.py", line 107, in