基于人体id检测的行为识别（吸烟训练报错）

cheesezoella commented 1 year ago

问题确认 Search before asking

[X] 我已经搜索过问题，但是没有找到解答。I have searched the question and found no related answer.

请提出你的问题 Please ask your question

你好，我在自行准备的数据上训练但报错。我是根据https://github.com/PaddlePaddle/PaddleDetection/blob/develop/docs/advanced_tutorials/customization/action_recognotion/idbased_det.md 的方案来进行模型训练。谢谢！运行：python -m paddle.distributed.launch --gpus 0 tools/train.py -c configs/pphuman/ppyoloe_crn_s_80e_smoking_visdrone.yml --eval --amp

W0729 09:54:24.241298 42635 device_context.cc:447] Please NOTE: device: 0, GPU Compute Capability: 7.5, Driver API Version: 11.4, Runtime API Version: 11.2 W0729 09:54:24.253404 42635 device_context.cc:465] device: 0, cuDNN Version: 8.1. [07/29 09:54:28] ppdet.utils.checkpoint INFO: The shape [10] in pretrained weight yolo_head.pred_cls.0.bias is unmatched with the shape [1] in model yolo_head.pred_cls.0.bias. And the weight yolo_head.pred_cls.0.bias will not be loaded [07/29 09:54:28] ppdet.utils.checkpoint INFO: The shape [10, 384, 3, 3] in pretrained weight yolo_head.pred_cls.0.weight is unmatched with the shape [1, 384, 3, 3] in model yolo_head.pred_cls.0.weight. And the weight yolo_head.pred_cls.0.weight will not be loaded [07/29 09:54:28] ppdet.utils.checkpoint INFO: The shape [10] in pretrained weight yolo_head.pred_cls.1.bias is unmatched with the shape [1] in model yolo_head.pred_cls.1.bias. And the weight yolo_head.pred_cls.1.bias will not be loaded [07/29 09:54:28] ppdet.utils.checkpoint INFO: The shape [10, 192, 3, 3] in pretrained weight yolo_head.pred_cls.1.weight is unmatched with the shape [1, 192, 3, 3] in model yolo_head.pred_cls.1.weight. And the weight yolo_head.pred_cls.1.weight will not be loaded [07/29 09:54:28] ppdet.utils.checkpoint INFO: The shape [10] in pretrained weight yolo_head.pred_cls.2.bias is unmatched with the shape [1] in model yolo_head.pred_cls.2.bias. And the weight yolo_head.pred_cls.2.bias will not be loaded [07/29 09:54:28] ppdet.utils.checkpoint INFO: The shape [10, 96, 3, 3] in pretrained weight yolo_head.pred_cls.2.weight is unmatched with the shape [1, 96, 3, 3] in model yolo_head.pred_cls.2.weight. And the weight yolo_head.pred_cls.2.weight will not be loaded [07/29 09:54:28] ppdet.utils.checkpoint INFO: Finish loading model weights: /home/.cache/paddle/weights/ppyoloe_crn_s_80e_visdrone.pdparams Error: /paddle/paddle/fluid/operators/gather.cu.h:62 Assertion index_value >= 0 && index_value < input_dims[j] failed. The index is out of bounds, please check whether the dimensions of index and input meet the requirements. It should be less than [1] and greater than or equal to 0, but received [0]

Traceback (most recent call last): File "tools/train.py", line 172, in main() File "tools/train.py", line 168, in main run(FLAGS, cfg) File "tools/train.py", line 132, in run trainer.train(FLAGS.eval) File "/home/action/paddle_develop/ppdet/engine/trainer.py", line 467, in train outputs = model(data) File "/home/anaconda3/envs/paddle/lib/python3.8/site-packages/paddle/fluid/dygraph/layers.py", line 917, in call return self._dygraph_call_func(*inputs, kwargs) File "/home/anaconda3/envs/paddle/lib/python3.8/site-packages/paddle/fluid/dygraph/layers.py", line 907, in _dygraph_call_func outputs = self.forward(*inputs, *kwargs) File "/home/action/paddle_develop/ppdet/modeling/architectures/meta_arch.py", line 59, in forward out = self.get_loss() File "/home/action/paddle_develop/ppdet/modeling/architectures/yolo.py", line 124, in get_loss return self._forward() File "/home/action/paddle_develop/ppdet/modeling/architectures/yolo.py", line 88, in _forward yolo_losses = self.yolo_head(neck_feats, self.inputs) File "/home/anaconda3/envs/paddle/lib/python3.8/site-packages/paddle/fluid/dygraph/layers.py", line 917, in call return self._dygraph_call_func(inputs, kwargs) File "/home/anaconda3/envs/paddle/lib/python3.8/site-packages/paddle/fluid/dygraph/layers.py", line 907, in _dygraph_call_func outputs = self.forward(*inputs, kwargs) File "/home/action/paddle_develop/ppdet/modeling/heads/ppyoloe_head.py", line 216, in forward return self.forward_train(feats, targets) File "/home/action/paddle_develop/ppdet/modeling/heads/ppyoloe_head.py", line 158, in forward_train return self.get_loss([ File "/home/action/paddle_develop/ppdet/modeling/heads/ppyoloe_head.py", line 322, in get_loss self.assigner( File "/home/anaconda3/envs/paddle/lib/python3.8/site-packages/paddle/fluid/dygraph/layers.py", line 917, in call return self._dygraph_call_func(*inputs, *kwargs) File "/home/anaconda3/envs/paddle/lib/python3.8/site-packages/paddle/fluid/dygraph/layers.py", line 907, in _dygraph_call_func outputs = self.forward(inputs, kwargs) File "/home/anaconda3/envs/paddle/lib/python3.8/site-packages/decorator.py", line 232, in fun return caller(func, *(extras + args), *kw) File "/home/anaconda3/envs/paddle/lib/python3.8/site-packages/paddle/fluid/dygraph/base.py", line 351, in _decorate_function return func(args, *kwargs) File "/home/action/paddle_develop/ppdet/modeling/assigners/task_aligned_assigner.py", line 114, in forward is_in_topk = gather_topk_anchors( File "/home/action/paddle_develop/ppdet/modeling/assigners/utils.py", line 105, in gather_topk_anchors return is_in_topk topk_mask File "/home/anaconda3/envs/paddle/lib/python3.8/site-packages/paddle/fluid/dygraph/math_op_patch.py", line 264, in impl return math_op(self, other_var, 'axis', axis) OSError: (External) CUDA error(719), unspecified launch failure. [Hint: 'cudaErrorLaunchFailure'. An exception occurred on the device while executing a kernel. Common causes include dereferencing an invalid device pointerand accessing out of bounds shared memory. Less common cases can be system specific - more information about these cases canbe found in the system specific user guide. This leaves the process in an inconsistent state and any further CUDA work willreturn the same error. To continue using CUDA, the process must be terminated and relaunched.] (at /paddle/paddle/fluid/platform/gpu_info.cc:441) [operator < elementwise_mul > error] INFO 2022-07-29 09:54:36,315 launch_utils.py:341] terminate all the procs ERROR 2022-07-29 09:54:36,315 launch_utils.py:602] ABORT!!! Out of all 1 trainers, the trainer process with rank=[0] was aborted. Please check its log. INFO 2022-07-29 09:54:40,317 launch_utils.py:341] terminate all the procs INFO 2022-07-29 09:54:40,317 launch.py:311] Local processes completed.

nemonameless commented 1 year ago

单卡训不需要 paddle.distributed.launch 直接CUDA_VISIBLE_DEVICES=0 python3.7 tools/train.py -c ${config} --amp --eval

cheesezoella commented 1 year ago

好的，谢谢！那请问configs/pphuman/ppyoloe_crn_s_80e_smoking_visdrone.yml文件下的配置有什么建议吗?比如learning rate,epoch 和 batch size。我大概有一万张照片。

nemonameless commented 1 year ago

PP-YOLOE模型训练过程中使用8 GPUs进行混合精度训练，如果GPU卡数或者batch size发生了改变，你需要按照公式 *lr_new = lr_default (batch_size_new GPU_number_new) / (batch_size_default GPU_number_default)** 调整学习率。

PaddlePaddle / PaddleDetection

基于人体id检测的行为识别（吸烟训练报错） #6539

问题确认 Search before asking

请提出你的问题 Please ask your question