PaddlePaddle / PaddleDetection

Object Detection toolkit based on PaddlePaddle. It supports object detection, instance segmentation, multiple object tracking and real-time multi-person keypoint detection.
Apache License 2.0
12.78k stars 2.88k forks source link

(InvalidArgument) Sum of Attr(num_or_sections) must be equal to the input's size along the split dimension. #8146

Closed guanshanjushi closed 1 year ago

guanshanjushi commented 1 year ago

问题确认 Search before asking

Bug组件 Bug Component

Training

Bug描述 Describe the Bug

我在训练rtdetr的时候出现一下问题: INFO 2023-04-25 14:15:35,478 utils.py:148] Note: NumExpr detected 16 cores but "NUMEXPR_MAX_THREADS" not set, so enforcing safe limit of 8. loading annotations into memory... Done (t=0.58s) creating index... index created! [04/25 14:15:37] ppdet.data.source.coco INFO: Load [4849 samples valid, 1 samples invalid] in file /home/wxp/wxp_dataset/newcityevent/验证训练集/dataset_科技部课题/train/coco_train.json. W0425 14:15:37.660985 65296 gpu_resources.cc:61] Please NOTE: device: 0, GPU Compute Capability: 8.6, Driver API Version: 12.0, Runtime API Version: 10.2 W0425 14:15:37.663920 65296 gpu_resources.cc:91] device: 0, cuDNN Version: 7.6. [04/25 14:15:39] ppdet.utils.checkpoint INFO: ['fc.bias', 'fc.weight', 'last_conv.weight'] in pretrained weight is not used in the model, and its will not be loaded [04/25 14:15:39] ppdet.utils.checkpoint INFO: Finish loading model weights: /home/wxp/project_wxp/github/YOLO/PaddleDetection/pretrain_weights/PPHGNetV2_L_ssld_pretrained.pdparams Traceback (most recent call last): File "/home/wxp/project_wxp/github/YOLO/PaddleDetection/tools/train.py", line 204, in main() File "/home/wxp/project_wxp/github/YOLO/PaddleDetection/tools/train.py", line 200, in main run(FLAGS, cfg) File "/home/wxp/project_wxp/github/YOLO/PaddleDetection/tools/train.py", line 153, in run trainer.train(FLAGS.eval) File "/home/wxp/project_wxp/github/YOLO/PaddleDetection/ppdet/engine/trainer.py", line 542, in train outputs = model(data) File "/home/wxp/anaconda3/lib/python3.9/site-packages/paddle/fluid/dygraph/layers.py", line 930, in call return self._dygraph_call_func(*inputs, kwargs) File "/home/wxp/anaconda3/lib/python3.9/site-packages/paddle/fluid/dygraph/layers.py", line 915, in _dygraph_call_func outputs = self.forward(*inputs, *kwargs) File "/home/wxp/project_wxp/github/YOLO/PaddleDetection/ppdet/modeling/architectures/meta_arch.py", line 60, in forward out = self.get_loss() File "/home/wxp/project_wxp/github/YOLO/PaddleDetection/ppdet/modeling/architectures/detr.py", line 113, in get_loss return self._forward() File "/home/wxp/project_wxp/github/YOLO/PaddleDetection/ppdet/modeling/architectures/detr.py", line 87, in _forward out_transformer = self.transformer(body_feats, pad_mask, self.inputs) File "/home/wxp/anaconda3/lib/python3.9/site-packages/paddle/fluid/dygraph/layers.py", line 930, in call return self._dygraph_call_func(inputs, kwargs) File "/home/wxp/anaconda3/lib/python3.9/site-packages/paddle/fluid/dygraph/layers.py", line 915, in _dygraph_call_func outputs = self.forward(*inputs, *kwargs) File "/home/wxp/project_wxp/github/YOLO/PaddleDetection/ppdet/modeling/transformers/rtdetr_transformer.py", line 442, in forward get_contrastive_denoising_training_group(gt_meta, File "/home/wxp/project_wxp/github/YOLO/PaddleDetection/ppdet/modeling/transformers/utils.py", line 258, in get_contrastive_denoising_training_group dn_positive_idx = paddle.split(dn_positive_idx, File "/home/wxp/anaconda3/lib/python3.9/site-packages/paddle/tensor/manipulation.py", line 954, in split return paddle.fluid.layers.split( File "/home/wxp/anaconda3/lib/python3.9/site-packages/paddle/fluid/layers/nn.py", line 5097, in split _C_ops.split(input, out, attrs) ValueError: (InvalidArgument) Sum of Attr(num_or_sections) must be equal to the input's size along the split dimension. But received Attr(num_or_sections) = [80, 52], input(X)'s shape = [1638400], Attr(dim) = 0. [Hint: Expected sum_of_section == input_axis_dim, but received sum_of_section:132 != input_axis_dim:1638400.] (at /paddle/paddle/fluid/operators/split_op.h:100) [operator < split > error]

复现环境 Environment

Bug描述确认 Bug description confirmation

是否愿意提交PR? Are you willing to submit a PR?

guanshanjushi commented 1 year ago

使用自己的数据集进行训练,修改了num_class,同时修改了batch_size=2,worker_num=2: worker_num: 2 TrainReader: sample_transforms:

EvalReader: sample_transforms:

TestReader: inputs_def: image_shape: [3, 640, 640] sample_transforms:

guanshanjushi commented 1 year ago

找到原因了,由于我反复测试,主要是因为paddle的cuda版本问题,最新的rtdetr采用cuda10.2版本的paddle训练时会出现以上问题,但采用cuda11.1版本以后就不会出现以上问题,同时要注意cudnn的版本要和paddle要求的版本一致即可。 愿所有人都不被环境配置所困扰