PaddlePaddle / Paddle

PArallel Distributed Deep LEarning: Machine Learning Framework from Industrial Practice (『飞桨』核心框架,深度学习&机器学习高性能单机、分布式训练和跨平台部署)
http://www.paddlepaddle.org/
Apache License 2.0
22.27k stars 5.6k forks source link

paddlepaddle-gpu 2.3.2训练报错,return self._getitem_index_not_tensor(item) OSError: (External) CUDA error(719), unspecified launch failure. #46386

Open monkeycc opened 2 years ago

monkeycc commented 2 years ago

请提出你的问题 Please ask your question

虽然是paddleX的框架训练 但是报错的是paddle

其他模型没问题 就FasterRCNN 报错

最后换成 paddlepaddle-gpu==2.2.2 解决问题

希望能修复2.3的问题

W0922 09:51:58.184947 12832 gpu_resources.cc:61] Please NOTE: device: 0, GPU Compute Capability: 8.6, Driver API Version: 11.6, Runtime API Version: 11.2
W0922 09:51:58.191946 12832 gpu_resources.cc:91] device: 0, cuDNN Version: 8.2.
num_epochs222 12
Daochumoxingmulu train pretrained_dir F:/AI/1233_ResNet50_vd_2022-09-22_09_51_57
pretrained_dir osp.join output/faster_rcnn_r50_fpn
2022-09-22 09:51:58 [INFO]      Loading pretrained model from E:/Intsoft_Pretrain2/FasterRCNN_ResNet50_fpn_IMAGENET\ResNet50_cos_pretrained.pdparams
2022-09-22 09:51:58 [WARNING]   neck.fpn_inner_res2_sum_lateral.weight is not in pretrained model
2022-09-22 09:51:58 [WARNING]   neck.fpn_inner_res2_sum_lateral.bias is not in pretrained model
2022-09-22 09:51:58 [WARNING]   neck.fpn_res2_sum.weight is not in pretrained model
2022-09-22 09:51:58 [WARNING]   neck.fpn_res2_sum.bias is not in pretrained model
2022-09-22 09:51:58 [WARNING]   neck.fpn_inner_res3_sum_lateral.weight is not in pretrained model
2022-09-22 09:51:58 [WARNING]   neck.fpn_inner_res3_sum_lateral.bias is not in pretrained model
2022-09-22 09:51:58 [WARNING]   neck.fpn_res3_sum.weight is not in pretrained model
2022-09-22 09:51:58 [WARNING]   neck.fpn_res3_sum.bias is not in pretrained model
2022-09-22 09:51:58 [WARNING]   neck.fpn_inner_res4_sum_lateral.weight is not in pretrained model
2022-09-22 09:51:58 [WARNING]   neck.fpn_inner_res4_sum_lateral.bias is not in pretrained model
2022-09-22 09:51:58 [WARNING]   neck.fpn_res4_sum.weight is not in pretrained model
2022-09-22 09:51:58 [WARNING]   neck.fpn_res4_sum.bias is not in pretrained model
2022-09-22 09:51:58 [WARNING]   neck.fpn_inner_res5_sum.weight is not in pretrained model
2022-09-22 09:51:58 [WARNING]   neck.fpn_inner_res5_sum.bias is not in pretrained model
2022-09-22 09:51:58 [WARNING]   neck.fpn_res5_sum.weight is not in pretrained model
2022-09-22 09:51:58 [WARNING]   neck.fpn_res5_sum.bias is not in pretrained model
2022-09-22 09:51:58 [WARNING]   rpn_head.rpn_feat.rpn_conv.weight is not in pretrained model
2022-09-22 09:51:58 [WARNING]   rpn_head.rpn_feat.rpn_conv.bias is not in pretrained model
2022-09-22 09:51:58 [WARNING]   rpn_head.rpn_rois_score.weight is not in pretrained model
2022-09-22 09:51:58 [WARNING]   rpn_head.rpn_rois_score.bias is not in pretrained model
2022-09-22 09:51:58 [WARNING]   rpn_head.rpn_rois_delta.weight is not in pretrained model
2022-09-22 09:51:58 [WARNING]   rpn_head.rpn_rois_delta.bias is not in pretrained model
2022-09-22 09:51:58 [WARNING]   bbox_head.head.fc6.weight is not in pretrained model
2022-09-22 09:51:58 [WARNING]   bbox_head.head.fc6.bias is not in pretrained model
2022-09-22 09:51:58 [WARNING]   bbox_head.head.fc7.weight is not in pretrained model
2022-09-22 09:51:58 [WARNING]   bbox_head.head.fc7.bias is not in pretrained model
2022-09-22 09:51:58 [WARNING]   bbox_head.bbox_score.weight is not in pretrained model
2022-09-22 09:51:58 [WARNING]   bbox_head.bbox_score.bias is not in pretrained model
2022-09-22 09:51:58 [WARNING]   bbox_head.bbox_delta.weight is not in pretrained model
2022-09-22 09:51:58 [WARNING]   bbox_head.bbox_delta.bias is not in pretrained model
2022-09-22 09:51:58 [INFO]      There are 265/295 variables loaded into FasterRCNN.
Error: ../paddle/phi/kernels/funcs/scatter.cu.h:66 Assertion `scatter_i >= 0` failed. The index is out of bounds, please check whether the dimensions of index and input meet the requirements. It should be greater than or equal to 0, but received [-1110529498]
Error: ../paddle/phi/kernels/funcs/scatter.cu.h:66 Assertion `scatter_i >= 0` failed. The index is out of bounds, please check whether the dimensions of index and input meet the requirements. It should be greater than or equal to 0, but received [-1109329307]
Error: ../paddle/phi/kernels/funcs/scatter.cu.h:66 Assertion `scatter_i >= 0` failed. The index is out of bounds, please check whether the dimensions of index and input meet the requirements. It should be greater than or equal to 0, but received [-1157890495]
Error: ../paddle/phi/kernels/funcs/scatter.cu.h:66 Assertion `scatter_i >= 0` failed. The index is out of bounds, please check whether the dimensions of index and input meet the requirements. It should be greater than or equal to 0, but received [-1112468260]
Error: ../paddle/phi/kernels/funcs/scatter.cu.h:66 Assertion `scatter_i >= 0` failed. The index is out of bounds, please check whether the dimensions of index and input meet the requirements. It should be greater than or equal to 0, but received [-1111433916]
Error: ../paddle/phi/kernels/funcs/scatter.cu.h:66 Assertion `scatter_i >= 0` failed. The index is out of bounds, please check whether the dimensions of index and input meet the requirements. It should be greater than or equal to 0, but received [-1113625662]
Error: ../paddle/phi/kernels/funcs/scatter.cu.h:66 Assertion `scatter_i >= 0` failed. The index is out of bounds, please check whether the dimensions of index and input meet the requirements. It should be greater than or equal to 0, but received [-1108166455]
Error: ../paddle/phi/kernels/funcs/scatter.cu.h:66 Assertion `scatter_i >= 0` failed. The index is out of bounds, please check whether the dimensions of index and input meet the requirements. It should be greater than or equal to 0, but received [-1130415248]
Error: ../paddle/phi/kernels/funcs/scatter.cu.h:66 Assertion `scatter_i >= 0` failed. The index is out of bounds, please check whether the dimensions of index and input meet the requirements. It should be greater than or equal to 0, but received [-1114690607]
Error: ../paddle/phi/kernels/funcs/scatter.cu.h:66 Assertion `scatter_i >= 0` failed. The index is out of bounds, please check whether the dimensions of index and input meet the requirements. It should be greater than or equal to 0, but received [-1106957141]
Error: ../paddle/phi/kernels/funcs/scatter.cu.h:66 Assertion `scatter_i >= 0` failed. The index is out of bounds, please check whether the dimensions of index and input meet the requirements. It should be greater than or equal to 0, but received [-1118707754]
Error: ../paddle/phi/kernels/funcs/scatter.cu.h:66 Assertion `scatter_i >= 0` failed. The index is out of bounds, please check whether the dimensions of index and input meet the requirements. It should be greater than or equal to 0, but received [-1109324224]
Error: ../paddle/phi/kernels/funcs/scatter.cu.h:66 Assertion `scatter_i >= 0` failed. The index is out of bounds, please check whether the dimensions of index and input meet the requirements. It should be greater than or equal to 0, but received [-1110389007]
Error: ../paddle/phi/kernels/funcs/scatter.cu.h:66 Assertion `scatter_i >= 0` failed. The index is out of bounds, please check whether the dimensions of index and input meet the requirements. It should be greater than or equal to 0, but received [-1146318115]
Error: ../paddle/phi/kernels/funcs/scatter.cu.h:66 Assertion `scatter_i >= 0` failed. The index is out of bounds, please check whether the dimensions of index and input meet the requirements. It should be greater than or equal to 0, but received [-1119027899]
Error: ../paddle/phi/kernels/funcs/scatter.cu.h:66 Assertion `scatter_i >= 0` failed. The index is out of bounds, please check whether the dimensions of index and input meet the requirements. It should be greater than or equal to 0, but received [-1111728838]
Error: ../paddle/phi/kernels/funcs/scatter.cu.h:66 Assertion `scatter_i >= 0` failed. The index is out of bounds, please check whether the dimensions of index and input meet the requirements. It should be greater than or equal to 0, but received [-1117886148]
Error: ../paddle/phi/kernels/funcs/scatter.cu.h:66 Assertion `scatter_i >= 0` failed. The index is out of bounds, please check whether the dimensions of index and input meet the requirements. It should be greater than or equal to 0, but received [-1116087952]
Error: ../paddle/phi/kernels/funcs/scatter.cu.h:66 Assertion `scatter_i >= 0` failed. The index is out of bounds, please check whether the dimensions of index and input meet the requirements. It should be greater than or equal to 0, but received [-1104819815]
Error: ../paddle/phi/kernels/funcs/scatter.cu.h:66 Assertion `scatter_i >= 0` failed. The index is out of bounds, please check whether the dimensions of index and input meet the requirements. It should be greater than or equal to 0, but received [-1115156369]
Error: ../paddle/phi/kernels/funcs/scatter.cu.h:66 Assertion `scatter_i >= 0` failed. The index is out of bounds, please check whether the dimensions of index and input meet the requirements. It should be greater than or equal to 0, but received [-1125361397]
Error: ../paddle/phi/kernels/funcs/scatter.cu.h:66 Assertion `scatter_i >= 0` failed. The index is out of bounds, please check whether the dimensions of index and input meet the requirements. It should be greater than or equal to 0, but received [-1113285200]
Error: ../paddle/phi/kernels/funcs/scatter.cu.h:66 Assertion `scatter_i >= 0` failed. The index is out of bounds, please check whether the dimensions of index and input meet the requirements. It should be greater than or equal to 0, but received [-1112926126]
Error: ../paddle/phi/kernels/funcs/scatter.cu.h:66 Assertion `scatter_i >= 0` failed. The index is out of bounds, please check whether the dimensions of index and input meet the requirements. It should be greater than or equal to 0, but received [-1112495624]
Error: ../paddle/phi/kernels/funcs/scatter.cu.h:66 Assertion `scatter_i >= 0` failed. The index is out of bounds, please check whether the dimensions of index and input meet the requirements. It should be greater than or equal to 0, but received [-1133989623]
Error: ../paddle/phi/kernels/funcs/scatter.cu.h:66 Assertion `scatter_i >= 0` failed. The index is out of bounds, please check whether the dimensions of index and input meet the requirements. It should be greater than or equal to 0, but received [-1131700321]
Error: ../paddle/phi/kernels/funcs/scatter.cu.h:66 Assertion `scatter_i >= 0` failed. The index is out of bounds, please check whether the dimensions of index and input meet the requirements. It should be greater than or equal to 0, but received [-1109823092]
Error: ../paddle/phi/kernels/funcs/scatter.cu.h:66 Assertion `scatter_i >= 0` failed. The index is out of bounds, please check whether the dimensions of index and input meet the requirements. It should be greater than or equal to 0, but received [-1112164588]
Error: ../paddle/phi/kernels/funcs/scatter.cu.h:66 Assertion `scatter_i >= 0` failed. The index is out of bounds, please check whether the dimensions of index and input meet the requirements. It should be greater than or equal to 0, but received [-1115624544]
Exception in thread paddle_train_MB:
Traceback (most recent call last):
  File "D:\Anaconda3\envs\PaddleDabao11237\lib\threading.py", line 926, in _bootstrap_inner
    self.run()
  File "D:\Anaconda3\envs\PaddleDabao11237\lib\threading.py", line 870, in run
    self._target(*self._args, **self._kwargs)
  File "f:\intsoft_AI_08\IntsoftAI.py", line 1740, in paddle_train_MB
    use_vdl=True)
  File "D:\Anaconda3\envs\PaddleDabao11237\lib\site-packages\paddlex\cv\models\detector.py", line 1388, in train
    early_stop_patience, use_vdl, resume_checkpoint)
  File "D:\Anaconda3\envs\PaddleDabao11237\lib\site-packages\paddlex\cv\models\detector.py", line 339, in train
    use_vdl=use_vdl)
  File "D:\Anaconda3\envs\PaddleDabao11237\lib\site-packages\paddlex\cv\models\base.py", line 343, in train_loop
    outputs = self.run(self.net, data, mode='train')
  File "D:\Anaconda3\envs\PaddleDabao11237\lib\site-packages\paddlex\cv\models\detector.py", line 105, in run
    net_out = net(inputs)
  File "D:\Anaconda3\envs\PaddleDabao11237\lib\site-packages\paddle\fluid\dygraph\layers.py", line 930, in __call__
    return self._dygraph_call_func(*inputs, **kwargs)
  File "D:\Anaconda3\envs\PaddleDabao11237\lib\site-packages\paddle\fluid\dygraph\layers.py", line 915, in _dygraph_call_func
    outputs = self.forward(*inputs, **kwargs)
  File "D:\Anaconda3\envs\PaddleDabao11237\lib\site-packages\paddlex\ppdet\modeling\architectures\meta_arch.py", line 59, in forward
    out = self.get_loss()
  File "D:\Anaconda3\envs\PaddleDabao11237\lib\site-packages\paddlex\ppdet\modeling\architectures\faster_rcnn.py", line 95, in get_loss
    rpn_loss, bbox_loss = self._forward()
  File "D:\Anaconda3\envs\PaddleDabao11237\lib\site-packages\paddlex\ppdet\modeling\architectures\faster_rcnn.py", line 76, in _forward
    rois, rois_num, rpn_loss = self.rpn_head(body_feats, self.inputs)
  File "D:\Anaconda3\envs\PaddleDabao11237\lib\site-packages\paddle\fluid\dygraph\layers.py", line 930, in __call__
    return self._dygraph_call_func(*inputs, **kwargs)
  File "D:\Anaconda3\envs\PaddleDabao11237\lib\site-packages\paddle\fluid\dygraph\layers.py", line 915, in _dygraph_call_func
    outputs = self.forward(*inputs, **kwargs)
  File "D:\Anaconda3\envs\PaddleDabao11237\lib\site-packages\paddlex\ppdet\modeling\proposal_generator\rpn_head.py", line 135, in forward
    loss = self.get_loss(scores, deltas, anchors, inputs)
  File "D:\Anaconda3\envs\PaddleDabao11237\lib\site-packages\paddlex\ppdet\modeling\proposal_generator\rpn_head.py", line 224, in get_loss
    anchors)
  File "D:\Anaconda3\envs\PaddleDabao11237\lib\site-packages\paddlex\ppdet\modeling\proposal_generator\target_layer.py", line 92, in __call__
    assign_on_cpu=self.assign_on_cpu)
  File "D:\Anaconda3\envs\PaddleDabao11237\lib\site-packages\paddlex\ppdet\modeling\proposal_generator\target.py", line 41, in rpn_anchor_target
    ignore_thresh, is_crowd_i, assign_on_cpu)
  File "D:\Anaconda3\envs\PaddleDabao11237\lib\site-packages\paddlex\ppdet\modeling\proposal_generator\target.py", line 83, in label_box
    iou = bbox_overlaps(gt_boxes, anchors)
  File "D:\Anaconda3\envs\PaddleDabao11237\lib\site-packages\paddlex\ppdet\modeling\bbox_utils.py", line 129, in bbox_overlaps
    area1 = bbox_area(boxes1)
  File "D:\Anaconda3\envs\PaddleDabao11237\lib\site-packages\paddlex\ppdet\modeling\bbox_utils.py", line 111, in bbox_area
    return (boxes[:, 2] - boxes[:, 0]) * (boxes[:, 3] - boxes[:, 1])
  File "D:\Anaconda3\envs\PaddleDabao11237\lib\site-packages\paddle\fluid\dygraph\varbase_patch_methods.py", line 740, in __getitem__
    return self._getitem_index_not_tensor(item)
OSError: (External) CUDA error(719), unspecified launch failure.
  [Hint: 'cudaErrorLaunchFailure'. An exception occurred on the device while executing a kernel. Common causes include dereferencing an invalid device pointerand accessing out of bounds shared memory. Less common cases can be system specific - more information about these cases canbe found in the system specific user guide. This leaves the process in an inconsistent state and any further CUDA work willreturn the same error. To continue using CUDA, the process must be terminated and relaunched.] (at ..\paddle\phi\backends\gpu\cuda\cuda_info.cc:258)
  [operator < slice > error]

环境

  1. 请提供您使用的PaddlePaddle和PaddleX的版本号 paddlepaddle-gpu 2.3.2.post112 paddlex 2.1.0

  2. 请提供您使用的操作系统信息,如Linux/Windows/MacOS Windows

  3. 请问您使用的Python版本是? Python 3.7.13

  4. 请问您使用的CUDA/cuDNN的版本号是? cuda11.6 cudnn 8.20


paddle-bot[bot] commented 2 years ago

您好,我们已经收到了您的问题,会安排技术人员尽快解答您的问题,请耐心等待。请您再次检查是否提供了清晰的问题描述、复现代码、环境&版本、报错信息等。同时,您也可以通过查看官网API文档常见问题历史IssueAI社区来寻求解答。祝您生活愉快~

Hi! We've received your issue and please be patient to get responded. We will arrange technicians to answer your questions as soon as possible. Please make sure that you have posted enough message to demo your request. You may also check out the APIFAQGithub Issue and AI community to get the answer.Have a nice day!

1649759610 commented 2 years ago

你好,方便提供下复现代码吗?

1649759610 commented 2 years ago

后续会发布Paddle2.4版本,可以验证下看看是否解决了这个问题。

dingjiaweiww commented 2 years ago

你好,如果还有疑问,可以提供下复现代码,我们测试一下

RileyShe commented 1 year ago

paddlenlp 2.5.2.post0 paddleocr 2.6.1.3 paddlepaddle-gpu 0.0.0.post117

报错信息: input: Error: /paddle/paddle/phi/kernels/funcs/gather.cu.h:67 Assertion index_value >= 0 && index_value < input_dims[j] failed. The index is out of bounds, please check whether the dimensions of index and input meet the requirements. It should be less than [16] and greater than or equal to 0, but received [0]

用的最近的develop分支代码 。请问是什么原因? @dingjiaweiww @1649759610

Plusmile commented 1 year ago

paddlenlp 2.5.2.post0 paddleocr 2.6.1.3 paddlepaddle-gpu 0.0.0.post117

报错信息: input: Error: /paddle/paddle/phi/kernels/funcs/gather.cu.h:67 Assertion index_value >= 0 && index_value < input_dims[j] failed. The index is out of bounds, please check whether the dimensions of index and input meet the requirements. It should be less than [16] and greater than or equal to 0, but received [0]

用的最近的develop分支代码 。请问是什么原因? @dingjiaweiww @1649759610

请问有解决问题吗?

tameingjet commented 11 months ago

wsl2训练PP-YOLOE-R也出现这个问题,但在windows下没有出现