PaddlePaddle / PaddleDetection

Object Detection toolkit based on PaddlePaddle. It supports object detection, instance segmentation, multiple object tracking and real-time multi-person keypoint detection.
Apache License 2.0
12.81k stars 2.89k forks source link

Faster RCNN + FPN 训练Loss 为0 #5359

Closed qingchunnianshao closed 2 years ago

qingchunnianshao commented 2 years ago

使用的是自己的数据集整理成了VOC格式,数据确认过没有问题,因为训练faster_rcnn_r50_1x_coco.yml 配置时能够正常训练 添加FPN 后 loss_rpn_reg 与loss_bbox_cls 以及loss_bbox_reg 都为0,有没有什么建议如何排查这个问题

paddle版本:2.2.1 paddleDetection 版本2.3.0

请提供您使用的操作系统信息,如Linux/Windows/MacOS /Please provide the OS information, e.g., Linux: 操作系统Ubuntu18.04 请问您使用的Python版本是?/ Please provide the version of Python you used. python版本3.8.5 请问您使用的CUDA/cuDNN的版本号是?/ Please provide the version of CUDA/cuDNN you used. CUDA 版本11.2 cuDNN 版本 8.0 显卡为Geforce RTX3060 12G 显存

运行faster Rcnn +FPN faster_rcnn_r50_fpn_1x_coco.yml BASE: [ '../datasets/roadsign_voc.yml', '../runtime.yml', 'base/optimizer_1x_bydouble.yml', 'base/faster_rcnn_r50_fpn_bydouble.yml', 'base/faster_fpn_reader_bydouble.yml', ]

[03/14 10:20:33] ppdet.engine INFO: Epoch: [0] [ 0/3558] learning_rate: 0.000100 loss_rpn_cls: 0.696384 loss_rpn_reg: 0.000000 loss_bbox_cls: 2.489519 loss_bbox_reg: 0.000000 loss: 3.185904 eta: 6 days, 2:30:41 batch_cost: 1.8530 data_cost: 0.0003 ips: 0.5397 images/s [03/14 10:21:05] ppdet.engine INFO: Epoch: [0] [ 40/3558] learning_rate: 0.000136 loss_rpn_cls: 0.658481 loss_rpn_reg: 0.000000 loss_bbox_cls: 0.016879 loss_bbox_reg: 0.000000 loss: 0.668610 eta: 2 days, 17:42:57 batch_cost: 0.8057 data_cost: 0.0003 ips: 1.2411 images/s [03/14 10:21:38] ppdet.engine INFO: Epoch: [0] [ 80/3558] learning_rate: 0.000172 loss_rpn_cls: 0.014283 loss_rpn_reg: 0.000000 loss_bbox_cls: 0.000000 loss_bbox_reg: 0.000000 loss: 0.014283 eta: 2 days, 16:48:36 batch_cost: 0.8083 data_cost: 0.0003 ips: 1.2372 images/s [03/14 10:22:10] ppdet.engine INFO: Epoch: [0] [ 120/3558] learning_rate: 0.000208 loss_rpn_cls: 0.002416 loss_rpn_reg: 0.000000 loss_bbox_cls: 0.000000 loss_bbox_reg: 0.000000 loss: 0.002416 eta: 2 days, 16:40:08 batch_cost: 0.8149 data_cost: 0.0003 ips: 1.2272 images/s [03/14 10:22:43] ppdet.engine INFO: Epoch: [0] [ 160/3558] learning_rate: 0.000244 loss_rpn_cls: 0.001419 loss_rpn_reg: 0.000000 loss_bbox_cls: 0.000000 loss_bbox_reg: 0.000000 loss: 0.001419 eta: 2 days, 16:32:21 batch_cost: 0.8121 data_cost: 0.0003 ips: 1.2314 images/s [03/14 10:23:16] ppdet.engine INFO: Epoch: [0] [ 200/3558] learning_rate: 0.000280 loss_rpn_cls: 0.000719 loss_rpn_reg: 0.000000 loss_bbox_cls: 0.000000 loss_bbox_reg: 0.000000 loss: 0.000719 eta: 2 days, 16:34:13 batch_cost: 0.8193 data_cost: 0.0004 ips: 1.2206 images/s [03/14 10:23:49] ppdet.engine INFO: Epoch: [0] [ 240/3558] learning_rate: 0.000316 loss_rpn_cls: 0.000600 loss_rpn_reg: 0.000000 loss_bbox_cls: 0.000000 loss_bbox_reg: 0.000000 loss: 0.000600 eta: 2 days, 16:34:42 batch_cost: 0.8185 data_cost: 0.0003 ips: 1.2217 images/s [03/14 10:24:21] ppdet.engine INFO: Epoch: [0] [ 280/3558] learning_rate: 0.000352 loss_rpn_cls: 0.000415 loss_rpn_reg: 0.000000 loss_bbox_cls: 0.000000 loss_bbox_reg: 0.000000 loss: 0.000415 eta: 2 days, 16:31:16 batch_cost: 0.8132 data_cost: 0.0003 ips: 1.2297 images/s [03/14 10:24:54] ppdet.engine INFO: Epoch: [0] [ 320/3558] learning_rate: 0.000388 loss_rpn_cls: 0.000279 loss_rpn_reg: 0.000000 loss_bbox_cls: 0.000000 loss_bbox_reg: 0.000000 loss: 0.000279 eta: 2 days, 16:28:34 batch_cost: 0.8132 data_cost: 0.0003 ips: 1.2298 images/s [03/14 10:25:26] ppdet.engine INFO: Epoch: [0] [ 360/3558] learning_rate: 0.000424 loss_rpn_cls: 0.000287 loss_rpn_reg: 0.000000 loss_bbox_cls: 0.000000 loss_bbox_reg: 0.000000 loss: 0.000287 eta: 2 days, 16:25:23 batch_cost: 0.8114 data_cost: 0.0003 ips: 1.2325 images/s [03/14 10:25:59] ppdet.engine INFO: Epoch: [0] [ 400/3558] learning_rate: 0.000460 loss_rpn_cls: 0.000307 loss_rpn_reg: 0.000000 loss_bbox_cls: 0.000000 loss_bbox_reg: 0.000000 loss: 0.000307 eta: 2 days, 16:22:22 batch_cost: 0.8106 data_cost: 0.0003 ips: 1.2337 images/s [03/14 10:26:31] ppdet.engine INFO: Epoch: [0] [ 440/3558] learning_rate: 0.000496 loss_rpn_cls: 0.000179 loss_rpn_reg: 0.000000 loss_bbox_cls: 0.000000 loss_bbox_reg: 0.000000 loss: 0.000179 eta: 2 days, 16:20:45 batch_cost: 0.8128 data_cost: 0.0003 ips: 1.2303 images/s [03/14 10:27:04] ppdet.engine INFO: Epoch: [0] [ 480/3558] learning_rate: 0.000532 loss_rpn_cls: 0.000144 loss_rpn_reg: 0.000000 loss_bbox_cls: 0.000000 loss_bbox_reg: 0.000000 loss: 0.000144 eta: 2 days, 16:19:29 batch_cost: 0.8132 data_cost: 0.0003 ips: 1.2297 images/s [03/14 10:27:36] ppdet.engine INFO: Epoch: [0] [ 520/3558] learning_rate: 0.000568 loss_rpn_cls: 0.000124 loss_rpn_reg: 0.000000 loss_bbox_cls: 0.000000 loss_bbox_reg: 0.000000 loss: 0.000124 eta: 2 days, 16:17:17 batch_cost: 0.8104 data_cost: 0.0003 ips: 1.2340 images/s [03/14 10:28:09] ppdet.engine INFO: Epoch: [0] [ 560/3558] learning_rate: 0.000604 loss_rpn_cls: 0.000118 loss_rpn_reg: 0.000000 loss_bbox_cls: 0.000000 loss_bbox_reg: 0.000000 loss: 0.000118 eta: 2 days, 16:17:18 batch_cost: 0.8162 data_cost: 0.0003 ips: 1.2251 images/s [03/14 10:28:42] ppdet.engine INFO: Epoch: [0] [ 600/3558] learning_rate: 0.000640 loss_rpn_cls: 0.000064 loss_rpn_reg: 0.000000 loss_bbox_cls: 0.000000 loss_bbox_reg: 0.000000 loss: 0.000064 eta: 2 days, 16:15:25 batch_cost: 0.8105 data_cost: 0.0003 ips: 1.2339 images/s [03/14 10:29:14] ppdet.engine INFO: Epoch: [0] [ 640/3558] learning_rate: 0.000676 loss_rpn_cls: 0.000060 loss_rpn_reg: 0.000000 loss_bbox_cls: 0.000000 loss_bbox_reg: 0.000000 loss: 0.000060 eta: 2 days, 16:12:18 batch_cost: 0.8057 data_cost: 0.0002 ips: 1.2412 images/s [03/14 10:29:47] ppdet.engine INFO: Epoch: [0] [ 680/3558] learning_rate: 0.000712 loss_rpn_cls: 0.000117 loss_rpn_reg: 0.000000 loss_bbox_cls: 0.000000 loss_bbox_reg: 0.000000 loss: 0.000117 eta: 2 days, 16:12:42 batch_cost: 0.8172 data_cost: 0.0002 ips: 1.2236 images/s [03/14 10:30:19] ppdet.engine INFO: Epoch: [0] [ 720/3558] learning_rate: 0.000748 loss_rpn_cls: 0.000070 loss_rpn_reg: 0.000000 loss_bbox_cls: 0.000000 loss_bbox_reg: 0.000000 loss: 0.000070 eta: 2 days, 16:12:04 batch_cost: 0.8137 data_cost: 0.0002 ips: 1.2289 images/s [03/14 10:30:52] ppdet.engine INFO: Epoch: [0] [ 760/3558] learning_rate: 0.000784 loss_rpn_cls: 0.000051 loss_rpn_reg: 0.000000 loss_bbox_cls: 0.000000 loss_bbox_reg: 0.000000 loss: 0.000051 eta: 2 days, 16:11:41 batch_cost: 0.8147 data_cost: 0.0002 ips: 1.2274 images/s [03/14 10:31:25] ppdet.engine INFO: Epoch: [0] [ 800/3558] learning_rate: 0.000820 loss_rpn_cls: 0.000055 loss_rpn_reg: 0.000000 loss_bbox_cls: 0.000000 loss_bbox_reg: 0.000000 loss: 0.000055 eta: 2 days, 16:10:28 batch_cost: 0.8112 data_cost: 0.0002 ips: 1.2328 images/s [03/14 10:31:57] ppdet.engine INFO: Epoch: [0] [ 840/3558] learning_rate: 0.000856 loss_rpn_cls: 0.000022 loss_rpn_reg: 0.000000 loss_bbox_cls: 0.000000 loss_bbox_reg: 0.000000 loss: 0.000022 eta: 2 days, 16:09:08 batch_cost: 0.8104 data_cost: 0.0002 ips: 1.2339 images/s [03/14 10:32:30] ppdet.engine INFO: Epoch: [0] [ 880/3558] learning_rate: 0.000892 loss_rpn_cls: 0.000019 loss_rpn_reg: 0.000000 loss_bbox_cls: 0.000000 loss_bbox_reg: 0.000000 loss: 0.000019 eta: 2 days, 16:08:56 batch_cost: 0.8154 data_cost: 0.0003 ips: 1.2264 images/s [03/14 10:33:02] ppdet.engine INFO: Epoch: [0] [ 920/3558] learning_rate: 0.000928 loss_rpn_cls: 0.000021 loss_rpn_reg: 0.000000 loss_bbox_cls: 0.000000 loss_bbox_reg: 0.000000 loss: 0.000021 eta: 2 days, 16:08:33 batch_cost: 0.8146 data_cost: 0.0003 ips: 1.2276 images/s [03/14 10:33:35] ppdet.engine INFO: Epoch: [0] [ 960/3558] learning_rate: 0.000964 loss_rpn_cls: 0.000017 loss_rpn_reg: 0.000000 loss_bbox_cls: 0.000000 loss_bbox_reg: 0.000000 loss: 0.000017 eta: 2 days, 16:07:48 batch_cost: 0.8129 data_cost: 0.0003 ips: 1.2302 images/s [03/14 10:34:08] ppdet.engine INFO: Epoch: [0] [1000/3558] learning_rate: 0.001000 loss_rpn_cls: 0.000015 loss_rpn_reg: 0.000000 loss_bbox_cls: 0.000000 loss_bbox_reg: 0.000000 loss: 0.000015 eta: 2 days, 16:07:23 batch_cost: 0.8145 data_cost: 0.0003 ips: 1.2278 images/s [03/14 10:34:40] ppdet.engine INFO: Epoch: [0] [1040/3558] learning_rate: 0.001000 loss_rpn_cls: 0.000020 loss_rpn_reg: 0.000000 loss_bbox_cls: 0.000000 loss_bbox_reg: 0.000000 loss: 0.000020 eta: 2 days, 16:06:57 batch_cost: 0.8145 data_cost: 0.0002 ips: 1.2277 images/s [03/14 10:35:13] ppdet.engine INFO: Epoch: [0] [1080/3558] learning_rate: 0.001000 loss_rpn_cls: 0.000016 loss_rpn_reg: 0.000000 loss_bbox_cls: 0.000000 loss_bbox_reg: 0.000000 loss: 0.000016 eta: 2 days, 16:07:49 batch_cost: 0.8219 data_cost: 0.0044 ips: 1.2167 images/s [03/14 10:35:46] ppdet.engine INFO: Epoch: [0] [1120/3558] learning_rate: 0.001000 loss_rpn_cls: 0.000009 loss_rpn_reg: 0.000000 loss_bbox_cls: 0.000000 loss_bbox_reg: 0.000000 loss: 0.000009 eta: 2 days, 16:06:37 batch_cost: 0.8103 data_cost: 0.0002 ips: 1.2341 images/s [03/14 10:36:18] ppdet.engine INFO: Epoch: [0] [1160/3558] learning_rate: 0.001000 loss_rpn_cls: 0.000008 loss_rpn_reg: 0.000000 loss_bbox_cls: 0.000000 loss_bbox_reg: 0.000000 loss: 0.000008 eta: 2 days, 16:06:36 batch_cost: 0.8172 data_cost: 0.0002 ips: 1.2237 images/s [03/14 10:36:51] ppdet.engine INFO: Epoch: [0] [1200/3558] learning_rate: 0.001000 loss_rpn_cls: 0.000011 loss_rpn_reg: 0.000000 loss_bbox_cls: 0.000000 loss_bbox_reg: 0.000000 loss: 0.000011 eta: 2 days, 16:06:23 batch_cost: 0.8163 data_cost: 0.0003 ips: 1.2251 images/s [03/14 10:37:24] ppdet.engine INFO: Epoch: [0] [1240/3558] learning_rate: 0.001000 loss_rpn_cls: 0.000009 loss_rpn_reg: 0.000000 loss_bbox_cls: 0.000000 loss_bbox_reg: 0.000000 loss: 0.000009 eta: 2 days, 16:05:26 batch_cost: 0.8116 data_cost: 0.0002 ips: 1.2322 images/s [03/14 10:37:56] ppdet.engine INFO: Epoch: [0] [1280/3558] learning_rate: 0.001000 loss_rpn_cls: 0.000006 loss_rpn_reg: 0.000000 loss_bbox_cls: 0.000000 loss_bbox_reg: 0.000000 loss: 0.000006 eta: 2 days, 16:04:39 batch_cost: 0.8125 data_cost: 0.0002 ips: 1.2308 images/s

ghostxsl commented 2 years ago

可以参考下面的写法添加fpn:https://github.com/PaddlePaddle/PaddleDetection/blob/release/2.3/configs/faster_rcnn/faster_rcnn_r50_fpn_1x_coco.yml 可以排查一下看fpn有没有加错~