Object Detection toolkit based on PaddlePaddle. It supports object detection, instance segmentation, multiple object tracking and real-time multi-person keypoint detection.
Apache License 2.0
12.82k
stars
2.89k
forks
source link
OSError: (External) CUDA error(719), unspecified launch failure. #8492
[X] 我已经查询历史issue,没有发现相似的bug。I have searched the issues and found no similar bug report.
Bug组件 Bug Component
Training
Bug描述 Describe the Bug
Warning: Unable to use JDE/FairMOT/ByteTrack, please install lap, for example: pip install lap, see https://github.com/gatagat/lap
Warning: Unable to use numba in PP-Tracking, please install numba, for example(python3.7): pip install numba==0.56.4
Warning: Unable to use numba in PP-Tracking, please install numba, for example(python3.7): pip install numba==0.56.4
Warning: Unable to use MOT metric, please install motmetrics, for example: pip install motmetrics, see https://github.com/longcw/py-motmetrics
Warning: Unable to use MCMOT metric, please install motmetrics, for example: pip install motmetrics, see https://github.com/longcw/py-motmetrics
loading annotations into memory...
Done (t=0.06s)
creating index...
index created!
[07/29 10:23:03] ppdet.data.source.coco INFO: Load [3271 samples valid, 11 samples invalid] in file dataset/mydata/train/annotations/_annotations.coco.json.
W0729 10:23:03.598515 797620 gpu_resources.cc:119] Please NOTE: device: 0, GPU Compute Capability: 8.9, Driver API Version: 12.0, Runtime API Version: 11.8
W0729 10:23:03.599011 797620 gpu_resources.cc:149] device: 0, cuDNN Version: 8.8.
[07/29 10:23:05] ppdet.utils.checkpoint INFO: ['fc.bias', 'fc.weight', 'last_conv.weight'] in pretrained weight is not used in the model, and its will not be loaded
[07/29 10:23:05] ppdet.utils.checkpoint INFO: Finish loading model weights: /home/user/.cache/paddle/weights/PPHGNetV2_X_ssld_pretrained.pdparams
Error: ../paddle/phi/kernels/funcs/gather.cu.h:189 Assertion index_val >= 0 && index_val < input_index_dim_size failed. The index is out of bounds, please check whether the dimensions of index and input meet the requirements. It should be less than [1] and greater than or equal to 0, but received [0]
Error: ../paddle/phi/kernels/funcs/gather.cu.h:189 Assertion index_val >= 0 && index_val < input_index_dim_size failed. The index is out of bounds, please check whether the dimensions of index and input meet the requirements. It should be less than [1] and greater than or equal to 0, but received [0]
Error: ../paddle/phi/kernels/funcs/gather.cu.h:189 Assertion index_val >= 0 && index_val < input_index_dim_size failed. The index is out of bounds, please check whether the dimensions of index and input meet the requirements. It should be less than [1] and greater than or equal to 0, but received [0]
Error: ../paddle/phi/kernels/funcs/gather.cu.h:189 Assertion index_val >= 0 && index_val < input_index_dim_size failed. The index is out of bounds, please check whether the dimensions of index and input meet the requirements. It should be less than [1] and greater than or equal to 0, but received [0]
Error: ../paddle/phi/kernels/funcs/gather.cu.h:189 Assertion index_val >= 0 && index_val < input_index_dim_size failed. The index is out of bounds, please check whether the dimensions of index and input meet the requirements. It should be less than [1] and greater than or equal to 0, but received [0]
Error: ../paddle/phi/kernels/funcs/gather.cu.h:189 Assertion index_val >= 0 && index_val < input_index_dim_size failed. The index is out of bounds, please check whether the dimensions of index and input meet the requirements. It should be less than [1] and greater than or equal to 0, but received [0]
Error: ../paddle/phi/kernels/funcs/gather.cu.h:189 Assertion index_val >= 0 && index_val < input_index_dim_size failed. The index is out of bounds, please check whether the dimensions of index and input meet the requirements. It should be less than [1] and greater than or equal to 0, but received [0]
Error: ../paddle/phi/kernels/funcs/gather.cu.h:189 Assertion index_val >= 0 && index_val < input_index_dim_size failed. The index is out of bounds, please check whether the dimensions of index and input meet the requirements. It should be less than [1] and greater than or equal to 0, but received [0]
Error: ../paddle/phi/kernels/funcs/gather.cu.h:189 Assertion index_val >= 0 && index_val < input_index_dim_size failed. The index is out of bounds, please check whether the dimensions of index and input meet the requirements. It should be less than [1] and greater than or equal to 0, but received [0]
Error: ../paddle/phi/kernels/funcs/gather.cu.h:189 Assertion index_val >= 0 && index_val < input_index_dim_size failed. The index is out of bounds, please check whether the dimensions of index and input meet the requirements. It should be less than [1] and greater than or equal to 0, but received [0]
Error: ../paddle/phi/kernels/funcs/gather.cu.h:189 Assertion index_val >= 0 && index_val < input_index_dim_size failed. The index is out of bounds, please check whether the dimensions of index and input meet the requirements. It should be less than [1] and greater than or equal to 0, but received [0]
Error: ../paddle/phi/kernels/funcs/gather.cu.h:189 Assertion index_val >= 0 && index_val < input_index_dim_size failed. The index is out of bounds, please check whether the dimensions of index and input meet the requirements. It should be less than [1] and greater than or equal to 0, but received [0]
Traceback (most recent call last):
File "/home/user/test1/PaddleDetection/tools/train.py", line 209, in
main()
File "/home/user/test1/PaddleDetection/tools/train.py", line 205, in main
run(FLAGS, cfg)
File "/home/user/test1/PaddleDetection/tools/train.py", line 158, in run
trainer.train(FLAGS.eval)
File "/home/user/test1/PaddleDetection/ppdet/engine/trainer.py", line 577, in train
outputs = model(data)
File "/home/user/anaconda3/envs/paddle/lib/python3.9/site-packages/paddle/nn/layer/layers.py", line 1254, in call
return self.forward(*inputs, kwargs)
File "/home/user/test1/PaddleDetection/ppdet/modeling/architectures/meta_arch.py", line 60, in forward
out = self.get_loss()
File "/home/user/test1/PaddleDetection/ppdet/modeling/architectures/detr.py", line 115, in get_loss
return self._forward()
File "/home/user/test1/PaddleDetection/ppdet/modeling/architectures/detr.py", line 93, in _forward
detr_losses = self.detr_head(out_transformer, body_feats,
File "/home/user/anaconda3/envs/paddle/lib/python3.9/site-packages/paddle/nn/layer/layers.py", line 1254, in call
return self.forward(*inputs, *kwargs)
File "/home/user/test1/PaddleDetection/ppdet/modeling/heads/detr_head.py", line 453, in forward
return self.loss(
File "/home/user/anaconda3/envs/paddle/lib/python3.9/site-packages/paddle/nn/layer/layers.py", line 1254, in call
return self.forward(inputs, kwargs)
File "/home/user/test1/PaddleDetection/ppdet/modeling/losses/detr_loss.py", line 434, in forward
total_loss = super(DINOLoss, self).forward(
File "/home/user/test1/PaddleDetection/ppdet/modeling/losses/detr_loss.py", line 388, in forward
total_loss = self._get_prediction_loss(
File "/home/user/test1/PaddleDetection/ppdet/modeling/losses/detr_loss.py", line 322, in _get_prediction_loss
match_indices = self.matcher(
File "/home/user/anaconda3/envs/paddle/lib/python3.9/site-packages/paddle/nn/layer/layers.py", line 1254, in call
return self.forward(*inputs, **kwargs)
File "/home/user/test1/PaddleDetection/ppdet/modeling/transformers/matchers.py", line 178, in forward
indices = [
File "/home/user/test1/PaddleDetection/ppdet/modeling/transformers/matchers.py", line 179, in
linear_sum_assignment(c.split(sizes, -1)[i].numpy())
OSError: (External) CUDA error(719), unspecified launch failure.
[Hint: 'cudaErrorLaunchFailure'. An exception occurred on the device while executing a kernel. Common causes include dereferencing an invalid device pointerand accessing out of bounds shared memory. Less common cases can be system specific - more information about these cases canbe found in the system specific user guide. This leaves the process in an inconsistent state and any further CUDA work willreturn the same error. To continue using CUDA, the process must be terminated and relaunched.] (at ../paddle/phi/backends/gpu/cuda/cuda_info.cc:267)
training error
复现环境 Environment
OS: Linux
Ver: Paddle-gpu 2.5.0
cuda 11.2 ~ 12.0
Bug描述确认 Bug description confirmation
[X] 我确认已经提供了Bug复现步骤、代码改动说明、以及环境信息,确认问题是可以复现的。I confirm that the bug replication steps, code change instructions, and environment information have been provided, and the problem can be reproduced.
问题确认 Search before asking
Bug组件 Bug Component
Training
Bug描述 Describe the Bug
Warning: Unable to use JDE/FairMOT/ByteTrack, please install lap, for example:
main()
File "/home/user/test1/PaddleDetection/tools/train.py", line 205, in main
run(FLAGS, cfg)
File "/home/user/test1/PaddleDetection/tools/train.py", line 158, in run
trainer.train(FLAGS.eval)
File "/home/user/test1/PaddleDetection/ppdet/engine/trainer.py", line 577, in train
outputs = model(data)
File "/home/user/anaconda3/envs/paddle/lib/python3.9/site-packages/paddle/nn/layer/layers.py", line 1254, in call
return self.forward(*inputs, kwargs)
File "/home/user/test1/PaddleDetection/ppdet/modeling/architectures/meta_arch.py", line 60, in forward
out = self.get_loss()
File "/home/user/test1/PaddleDetection/ppdet/modeling/architectures/detr.py", line 115, in get_loss
return self._forward()
File "/home/user/test1/PaddleDetection/ppdet/modeling/architectures/detr.py", line 93, in _forward
detr_losses = self.detr_head(out_transformer, body_feats,
File "/home/user/anaconda3/envs/paddle/lib/python3.9/site-packages/paddle/nn/layer/layers.py", line 1254, in call
return self.forward(*inputs, *kwargs)
File "/home/user/test1/PaddleDetection/ppdet/modeling/heads/detr_head.py", line 453, in forward
return self.loss(
File "/home/user/anaconda3/envs/paddle/lib/python3.9/site-packages/paddle/nn/layer/layers.py", line 1254, in call
return self.forward(inputs, kwargs)
File "/home/user/test1/PaddleDetection/ppdet/modeling/losses/detr_loss.py", line 434, in forward
total_loss = super(DINOLoss, self).forward(
File "/home/user/test1/PaddleDetection/ppdet/modeling/losses/detr_loss.py", line 388, in forward
total_loss = self._get_prediction_loss(
File "/home/user/test1/PaddleDetection/ppdet/modeling/losses/detr_loss.py", line 322, in _get_prediction_loss
match_indices = self.matcher(
File "/home/user/anaconda3/envs/paddle/lib/python3.9/site-packages/paddle/nn/layer/layers.py", line 1254, in call
return self.forward(*inputs, **kwargs)
File "/home/user/test1/PaddleDetection/ppdet/modeling/transformers/matchers.py", line 178, in forward
indices = [
File "/home/user/test1/PaddleDetection/ppdet/modeling/transformers/matchers.py", line 179, in
linear_sum_assignment(c.split(sizes, -1)[i].numpy())
OSError: (External) CUDA error(719), unspecified launch failure.
[Hint: 'cudaErrorLaunchFailure'. An exception occurred on the device while executing a kernel. Common causes include dereferencing an invalid device pointerand accessing out of bounds shared memory. Less common cases can be system specific - more information about these cases canbe found in the system specific user guide. This leaves the process in an inconsistent state and any further CUDA work willreturn the same error. To continue using CUDA, the process must be terminated and relaunched.] (at ../paddle/phi/backends/gpu/cuda/cuda_info.cc:267)
pip install lap
, see https://github.com/gatagat/lap Warning: Unable to use numba in PP-Tracking, please install numba, for example(python3.7):pip install numba==0.56.4
Warning: Unable to use numba in PP-Tracking, please install numba, for example(python3.7):pip install numba==0.56.4
Warning: Unable to use MOT metric, please install motmetrics, for example:pip install motmetrics
, see https://github.com/longcw/py-motmetrics Warning: Unable to use MCMOT metric, please install motmetrics, for example:pip install motmetrics
, see https://github.com/longcw/py-motmetrics loading annotations into memory... Done (t=0.06s) creating index... index created! [07/29 10:23:03] ppdet.data.source.coco INFO: Load [3271 samples valid, 11 samples invalid] in file dataset/mydata/train/annotations/_annotations.coco.json. W0729 10:23:03.598515 797620 gpu_resources.cc:119] Please NOTE: device: 0, GPU Compute Capability: 8.9, Driver API Version: 12.0, Runtime API Version: 11.8 W0729 10:23:03.599011 797620 gpu_resources.cc:149] device: 0, cuDNN Version: 8.8. [07/29 10:23:05] ppdet.utils.checkpoint INFO: ['fc.bias', 'fc.weight', 'last_conv.weight'] in pretrained weight is not used in the model, and its will not be loaded [07/29 10:23:05] ppdet.utils.checkpoint INFO: Finish loading model weights: /home/user/.cache/paddle/weights/PPHGNetV2_X_ssld_pretrained.pdparams Error: ../paddle/phi/kernels/funcs/gather.cu.h:189 Assertionindex_val >= 0 && index_val < input_index_dim_size
failed. The index is out of bounds, please check whether the dimensions of index and input meet the requirements. It should be less than [1] and greater than or equal to 0, but received [0] Error: ../paddle/phi/kernels/funcs/gather.cu.h:189 Assertionindex_val >= 0 && index_val < input_index_dim_size
failed. The index is out of bounds, please check whether the dimensions of index and input meet the requirements. It should be less than [1] and greater than or equal to 0, but received [0] Error: ../paddle/phi/kernels/funcs/gather.cu.h:189 Assertionindex_val >= 0 && index_val < input_index_dim_size
failed. The index is out of bounds, please check whether the dimensions of index and input meet the requirements. It should be less than [1] and greater than or equal to 0, but received [0] Error: ../paddle/phi/kernels/funcs/gather.cu.h:189 Assertionindex_val >= 0 && index_val < input_index_dim_size
failed. The index is out of bounds, please check whether the dimensions of index and input meet the requirements. It should be less than [1] and greater than or equal to 0, but received [0] Error: ../paddle/phi/kernels/funcs/gather.cu.h:189 Assertionindex_val >= 0 && index_val < input_index_dim_size
failed. The index is out of bounds, please check whether the dimensions of index and input meet the requirements. It should be less than [1] and greater than or equal to 0, but received [0] Error: ../paddle/phi/kernels/funcs/gather.cu.h:189 Assertionindex_val >= 0 && index_val < input_index_dim_size
failed. The index is out of bounds, please check whether the dimensions of index and input meet the requirements. It should be less than [1] and greater than or equal to 0, but received [0] Error: ../paddle/phi/kernels/funcs/gather.cu.h:189 Assertionindex_val >= 0 && index_val < input_index_dim_size
failed. The index is out of bounds, please check whether the dimensions of index and input meet the requirements. It should be less than [1] and greater than or equal to 0, but received [0] Error: ../paddle/phi/kernels/funcs/gather.cu.h:189 Assertionindex_val >= 0 && index_val < input_index_dim_size
failed. The index is out of bounds, please check whether the dimensions of index and input meet the requirements. It should be less than [1] and greater than or equal to 0, but received [0] Error: ../paddle/phi/kernels/funcs/gather.cu.h:189 Assertionindex_val >= 0 && index_val < input_index_dim_size
failed. The index is out of bounds, please check whether the dimensions of index and input meet the requirements. It should be less than [1] and greater than or equal to 0, but received [0] Error: ../paddle/phi/kernels/funcs/gather.cu.h:189 Assertionindex_val >= 0 && index_val < input_index_dim_size
failed. The index is out of bounds, please check whether the dimensions of index and input meet the requirements. It should be less than [1] and greater than or equal to 0, but received [0] Error: ../paddle/phi/kernels/funcs/gather.cu.h:189 Assertionindex_val >= 0 && index_val < input_index_dim_size
failed. The index is out of bounds, please check whether the dimensions of index and input meet the requirements. It should be less than [1] and greater than or equal to 0, but received [0] Error: ../paddle/phi/kernels/funcs/gather.cu.h:189 Assertionindex_val >= 0 && index_val < input_index_dim_size
failed. The index is out of bounds, please check whether the dimensions of index and input meet the requirements. It should be less than [1] and greater than or equal to 0, but received [0] Traceback (most recent call last): File "/home/user/test1/PaddleDetection/tools/train.py", line 209, intraining error
复现环境 Environment
OS: Linux Ver: Paddle-gpu 2.5.0 cuda 11.2 ~ 12.0
Bug描述确认 Bug description confirmation
是否愿意提交PR? Are you willing to submit a PR?