PaddlePaddle / PaddleX

All-in-One Development Tool based on PaddlePaddle(飞桨低代码开发工具)
Apache License 2.0
4.91k stars 958 forks source link

训练报错, OSError: (External) CUDA error(719), unspecified launch failure.[Hint: 'cudaErrorLaunchFailure'. An exception occurred on the device while executing a kernel. #1604

Open monkeycc opened 2 years ago

monkeycc commented 2 years ago

自己用线程 开启动训练 分类模型 没问题 yolo模型没问题 FasterRCNN 报错 奇怪

W0922 09:51:58.184947 12832 gpu_resources.cc:61] Please NOTE: device: 0, GPU Compute Capability: 8.6, Driver API Version: 11.6, Runtime API Version: 11.2
W0922 09:51:58.191946 12832 gpu_resources.cc:91] device: 0, cuDNN Version: 8.2.
num_epochs222 12
Daochumoxingmulu train pretrained_dir F:/AI/1233_ResNet50_vd_2022-09-22_09_51_57
pretrained_dir osp.join output/faster_rcnn_r50_fpn
2022-09-22 09:51:58 [INFO]      Loading pretrained model from E:/Intsoft_Pretrain2/FasterRCNN_ResNet50_fpn_IMAGENET\ResNet50_cos_pretrained.pdparams
2022-09-22 09:51:58 [WARNING]   neck.fpn_inner_res2_sum_lateral.weight is not in pretrained model
2022-09-22 09:51:58 [WARNING]   neck.fpn_inner_res2_sum_lateral.bias is not in pretrained model
2022-09-22 09:51:58 [WARNING]   neck.fpn_res2_sum.weight is not in pretrained model
2022-09-22 09:51:58 [WARNING]   neck.fpn_res2_sum.bias is not in pretrained model
2022-09-22 09:51:58 [WARNING]   neck.fpn_inner_res3_sum_lateral.weight is not in pretrained model
2022-09-22 09:51:58 [WARNING]   neck.fpn_inner_res3_sum_lateral.bias is not in pretrained model
2022-09-22 09:51:58 [WARNING]   neck.fpn_res3_sum.weight is not in pretrained model
2022-09-22 09:51:58 [WARNING]   neck.fpn_res3_sum.bias is not in pretrained model
2022-09-22 09:51:58 [WARNING]   neck.fpn_inner_res4_sum_lateral.weight is not in pretrained model
2022-09-22 09:51:58 [WARNING]   neck.fpn_inner_res4_sum_lateral.bias is not in pretrained model
2022-09-22 09:51:58 [WARNING]   neck.fpn_res4_sum.weight is not in pretrained model
2022-09-22 09:51:58 [WARNING]   neck.fpn_res4_sum.bias is not in pretrained model
2022-09-22 09:51:58 [WARNING]   neck.fpn_inner_res5_sum.weight is not in pretrained model
2022-09-22 09:51:58 [WARNING]   neck.fpn_inner_res5_sum.bias is not in pretrained model
2022-09-22 09:51:58 [WARNING]   neck.fpn_res5_sum.weight is not in pretrained model
2022-09-22 09:51:58 [WARNING]   neck.fpn_res5_sum.bias is not in pretrained model
2022-09-22 09:51:58 [WARNING]   rpn_head.rpn_feat.rpn_conv.weight is not in pretrained model
2022-09-22 09:51:58 [WARNING]   rpn_head.rpn_feat.rpn_conv.bias is not in pretrained model
2022-09-22 09:51:58 [WARNING]   rpn_head.rpn_rois_score.weight is not in pretrained model
2022-09-22 09:51:58 [WARNING]   rpn_head.rpn_rois_score.bias is not in pretrained model
2022-09-22 09:51:58 [WARNING]   rpn_head.rpn_rois_delta.weight is not in pretrained model
2022-09-22 09:51:58 [WARNING]   rpn_head.rpn_rois_delta.bias is not in pretrained model
2022-09-22 09:51:58 [WARNING]   bbox_head.head.fc6.weight is not in pretrained model
2022-09-22 09:51:58 [WARNING]   bbox_head.head.fc6.bias is not in pretrained model
2022-09-22 09:51:58 [WARNING]   bbox_head.head.fc7.weight is not in pretrained model
2022-09-22 09:51:58 [WARNING]   bbox_head.head.fc7.bias is not in pretrained model
2022-09-22 09:51:58 [WARNING]   bbox_head.bbox_score.weight is not in pretrained model
2022-09-22 09:51:58 [WARNING]   bbox_head.bbox_score.bias is not in pretrained model
2022-09-22 09:51:58 [WARNING]   bbox_head.bbox_delta.weight is not in pretrained model
2022-09-22 09:51:58 [WARNING]   bbox_head.bbox_delta.bias is not in pretrained model
2022-09-22 09:51:58 [INFO]      There are 265/295 variables loaded into FasterRCNN.
Error: ../paddle/phi/kernels/funcs/scatter.cu.h:66 Assertion `scatter_i >= 0` failed. The index is out of bounds, please check whether the dimensions of index and input meet the requirements. It should be greater than or equal to 0, but received [-1110529498]
Error: ../paddle/phi/kernels/funcs/scatter.cu.h:66 Assertion `scatter_i >= 0` failed. The index is out of bounds, please check whether the dimensions of index and input meet the requirements. It should be greater than or equal to 0, but received [-1109329307]
Error: ../paddle/phi/kernels/funcs/scatter.cu.h:66 Assertion `scatter_i >= 0` failed. The index is out of bounds, please check whether the dimensions of index and input meet the requirements. It should be greater than or equal to 0, but received [-1157890495]
Error: ../paddle/phi/kernels/funcs/scatter.cu.h:66 Assertion `scatter_i >= 0` failed. The index is out of bounds, please check whether the dimensions of index and input meet the requirements. It should be greater than or equal to 0, but received [-1112468260]
Error: ../paddle/phi/kernels/funcs/scatter.cu.h:66 Assertion `scatter_i >= 0` failed. The index is out of bounds, please check whether the dimensions of index and input meet the requirements. It should be greater than or equal to 0, but received [-1111433916]
Error: ../paddle/phi/kernels/funcs/scatter.cu.h:66 Assertion `scatter_i >= 0` failed. The index is out of bounds, please check whether the dimensions of index and input meet the requirements. It should be greater than or equal to 0, but received [-1113625662]
Error: ../paddle/phi/kernels/funcs/scatter.cu.h:66 Assertion `scatter_i >= 0` failed. The index is out of bounds, please check whether the dimensions of index and input meet the requirements. It should be greater than or equal to 0, but received [-1108166455]
Error: ../paddle/phi/kernels/funcs/scatter.cu.h:66 Assertion `scatter_i >= 0` failed. The index is out of bounds, please check whether the dimensions of index and input meet the requirements. It should be greater than or equal to 0, but received [-1130415248]
Error: ../paddle/phi/kernels/funcs/scatter.cu.h:66 Assertion `scatter_i >= 0` failed. The index is out of bounds, please check whether the dimensions of index and input meet the requirements. It should be greater than or equal to 0, but received [-1114690607]
Error: ../paddle/phi/kernels/funcs/scatter.cu.h:66 Assertion `scatter_i >= 0` failed. The index is out of bounds, please check whether the dimensions of index and input meet the requirements. It should be greater than or equal to 0, but received [-1106957141]
Error: ../paddle/phi/kernels/funcs/scatter.cu.h:66 Assertion `scatter_i >= 0` failed. The index is out of bounds, please check whether the dimensions of index and input meet the requirements. It should be greater than or equal to 0, but received [-1118707754]
Error: ../paddle/phi/kernels/funcs/scatter.cu.h:66 Assertion `scatter_i >= 0` failed. The index is out of bounds, please check whether the dimensions of index and input meet the requirements. It should be greater than or equal to 0, but received [-1109324224]
Error: ../paddle/phi/kernels/funcs/scatter.cu.h:66 Assertion `scatter_i >= 0` failed. The index is out of bounds, please check whether the dimensions of index and input meet the requirements. It should be greater than or equal to 0, but received [-1110389007]
Error: ../paddle/phi/kernels/funcs/scatter.cu.h:66 Assertion `scatter_i >= 0` failed. The index is out of bounds, please check whether the dimensions of index and input meet the requirements. It should be greater than or equal to 0, but received [-1146318115]
Error: ../paddle/phi/kernels/funcs/scatter.cu.h:66 Assertion `scatter_i >= 0` failed. The index is out of bounds, please check whether the dimensions of index and input meet the requirements. It should be greater than or equal to 0, but received [-1119027899]
Error: ../paddle/phi/kernels/funcs/scatter.cu.h:66 Assertion `scatter_i >= 0` failed. The index is out of bounds, please check whether the dimensions of index and input meet the requirements. It should be greater than or equal to 0, but received [-1111728838]
Error: ../paddle/phi/kernels/funcs/scatter.cu.h:66 Assertion `scatter_i >= 0` failed. The index is out of bounds, please check whether the dimensions of index and input meet the requirements. It should be greater than or equal to 0, but received [-1117886148]
Error: ../paddle/phi/kernels/funcs/scatter.cu.h:66 Assertion `scatter_i >= 0` failed. The index is out of bounds, please check whether the dimensions of index and input meet the requirements. It should be greater than or equal to 0, but received [-1116087952]
Error: ../paddle/phi/kernels/funcs/scatter.cu.h:66 Assertion `scatter_i >= 0` failed. The index is out of bounds, please check whether the dimensions of index and input meet the requirements. It should be greater than or equal to 0, but received [-1104819815]
Error: ../paddle/phi/kernels/funcs/scatter.cu.h:66 Assertion `scatter_i >= 0` failed. The index is out of bounds, please check whether the dimensions of index and input meet the requirements. It should be greater than or equal to 0, but received [-1115156369]
Error: ../paddle/phi/kernels/funcs/scatter.cu.h:66 Assertion `scatter_i >= 0` failed. The index is out of bounds, please check whether the dimensions of index and input meet the requirements. It should be greater than or equal to 0, but received [-1125361397]
Error: ../paddle/phi/kernels/funcs/scatter.cu.h:66 Assertion `scatter_i >= 0` failed. The index is out of bounds, please check whether the dimensions of index and input meet the requirements. It should be greater than or equal to 0, but received [-1113285200]
Error: ../paddle/phi/kernels/funcs/scatter.cu.h:66 Assertion `scatter_i >= 0` failed. The index is out of bounds, please check whether the dimensions of index and input meet the requirements. It should be greater than or equal to 0, but received [-1112926126]
Error: ../paddle/phi/kernels/funcs/scatter.cu.h:66 Assertion `scatter_i >= 0` failed. The index is out of bounds, please check whether the dimensions of index and input meet the requirements. It should be greater than or equal to 0, but received [-1112495624]
Error: ../paddle/phi/kernels/funcs/scatter.cu.h:66 Assertion `scatter_i >= 0` failed. The index is out of bounds, please check whether the dimensions of index and input meet the requirements. It should be greater than or equal to 0, but received [-1133989623]
Error: ../paddle/phi/kernels/funcs/scatter.cu.h:66 Assertion `scatter_i >= 0` failed. The index is out of bounds, please check whether the dimensions of index and input meet the requirements. It should be greater than or equal to 0, but received [-1131700321]
Error: ../paddle/phi/kernels/funcs/scatter.cu.h:66 Assertion `scatter_i >= 0` failed. The index is out of bounds, please check whether the dimensions of index and input meet the requirements. It should be greater than or equal to 0, but received [-1109823092]
Error: ../paddle/phi/kernels/funcs/scatter.cu.h:66 Assertion `scatter_i >= 0` failed. The index is out of bounds, please check whether the dimensions of index and input meet the requirements. It should be greater than or equal to 0, but received [-1112164588]
Error: ../paddle/phi/kernels/funcs/scatter.cu.h:66 Assertion `scatter_i >= 0` failed. The index is out of bounds, please check whether the dimensions of index and input meet the requirements. It should be greater than or equal to 0, but received [-1115624544]
Exception in thread paddle_train_MB:
Traceback (most recent call last):
  File "D:\Anaconda3\envs\PaddleDabao11237\lib\threading.py", line 926, in _bootstrap_inner
    self.run()
  File "D:\Anaconda3\envs\PaddleDabao11237\lib\threading.py", line 870, in run
    self._target(*self._args, **self._kwargs)
  File "f:\intsoft_AI_08\IntsoftAI.py", line 1740, in paddle_train_MB
    use_vdl=True)
  File "D:\Anaconda3\envs\PaddleDabao11237\lib\site-packages\paddlex\cv\models\detector.py", line 1388, in train
    early_stop_patience, use_vdl, resume_checkpoint)
  File "D:\Anaconda3\envs\PaddleDabao11237\lib\site-packages\paddlex\cv\models\detector.py", line 339, in train
    use_vdl=use_vdl)
  File "D:\Anaconda3\envs\PaddleDabao11237\lib\site-packages\paddlex\cv\models\base.py", line 343, in train_loop
    outputs = self.run(self.net, data, mode='train')
  File "D:\Anaconda3\envs\PaddleDabao11237\lib\site-packages\paddlex\cv\models\detector.py", line 105, in run
    net_out = net(inputs)
  File "D:\Anaconda3\envs\PaddleDabao11237\lib\site-packages\paddle\fluid\dygraph\layers.py", line 930, in __call__
    return self._dygraph_call_func(*inputs, **kwargs)
  File "D:\Anaconda3\envs\PaddleDabao11237\lib\site-packages\paddle\fluid\dygraph\layers.py", line 915, in _dygraph_call_func
    outputs = self.forward(*inputs, **kwargs)
  File "D:\Anaconda3\envs\PaddleDabao11237\lib\site-packages\paddlex\ppdet\modeling\architectures\meta_arch.py", line 59, in forward
    out = self.get_loss()
  File "D:\Anaconda3\envs\PaddleDabao11237\lib\site-packages\paddlex\ppdet\modeling\architectures\faster_rcnn.py", line 95, in get_loss
    rpn_loss, bbox_loss = self._forward()
  File "D:\Anaconda3\envs\PaddleDabao11237\lib\site-packages\paddlex\ppdet\modeling\architectures\faster_rcnn.py", line 76, in _forward
    rois, rois_num, rpn_loss = self.rpn_head(body_feats, self.inputs)
  File "D:\Anaconda3\envs\PaddleDabao11237\lib\site-packages\paddle\fluid\dygraph\layers.py", line 930, in __call__
    return self._dygraph_call_func(*inputs, **kwargs)
  File "D:\Anaconda3\envs\PaddleDabao11237\lib\site-packages\paddle\fluid\dygraph\layers.py", line 915, in _dygraph_call_func
    outputs = self.forward(*inputs, **kwargs)
  File "D:\Anaconda3\envs\PaddleDabao11237\lib\site-packages\paddlex\ppdet\modeling\proposal_generator\rpn_head.py", line 135, in forward
    loss = self.get_loss(scores, deltas, anchors, inputs)
  File "D:\Anaconda3\envs\PaddleDabao11237\lib\site-packages\paddlex\ppdet\modeling\proposal_generator\rpn_head.py", line 224, in get_loss
    anchors)
  File "D:\Anaconda3\envs\PaddleDabao11237\lib\site-packages\paddlex\ppdet\modeling\proposal_generator\target_layer.py", line 92, in __call__
    assign_on_cpu=self.assign_on_cpu)
  File "D:\Anaconda3\envs\PaddleDabao11237\lib\site-packages\paddlex\ppdet\modeling\proposal_generator\target.py", line 41, in rpn_anchor_target
    ignore_thresh, is_crowd_i, assign_on_cpu)
  File "D:\Anaconda3\envs\PaddleDabao11237\lib\site-packages\paddlex\ppdet\modeling\proposal_generator\target.py", line 83, in label_box
    iou = bbox_overlaps(gt_boxes, anchors)
  File "D:\Anaconda3\envs\PaddleDabao11237\lib\site-packages\paddlex\ppdet\modeling\bbox_utils.py", line 129, in bbox_overlaps
    area1 = bbox_area(boxes1)
  File "D:\Anaconda3\envs\PaddleDabao11237\lib\site-packages\paddlex\ppdet\modeling\bbox_utils.py", line 111, in bbox_area
    return (boxes[:, 2] - boxes[:, 0]) * (boxes[:, 3] - boxes[:, 1])
  File "D:\Anaconda3\envs\PaddleDabao11237\lib\site-packages\paddle\fluid\dygraph\varbase_patch_methods.py", line 740, in __getitem__
    return self._getitem_index_not_tensor(item)
OSError: (External) CUDA error(719), unspecified launch failure.
  [Hint: 'cudaErrorLaunchFailure'. An exception occurred on the device while executing a kernel. Common causes include dereferencing an invalid device pointerand accessing out of bounds shared memory. Less common cases can be system specific - more information about these cases canbe found in the system specific user guide. This leaves the process in an inconsistent state and any further CUDA work willreturn the same error. To continue using CUDA, the process must be terminated and relaunched.] (at ..\paddle\phi\backends\gpu\cuda\cuda_info.cc:258)
  [operator < slice > error]

环境

  1. 请提供您使用的PaddlePaddle和PaddleX的版本号 paddlepaddle-gpu 2.3.2.post112 paddlex 2.1.0

  2. 请提供您使用的操作系统信息,如Linux/Windows/MacOS Windows

  3. 请问您使用的Python版本是? Python 3.7.13

  4. 请问您使用的CUDA/cuDNN的版本号是? cuda11.6 cudnn 8.20

monkeycc commented 2 years ago

问题解决

换成 paddlepaddle-gpu==2.2.2

2.3的 看来还是不少问题