PaddlePaddle / PaddleYOLO

🚀🚀🚀 YOLO series of PaddlePaddle implementation, PP-YOLOE+, RT-DETR, YOLOv5, YOLOv6, YOLOv7, YOLOv8, YOLOv10, YOLOX, YOLOv5u, YOLOv7u, YOLOv6Lite, RTMDet and so on. 🚀🚀🚀
https://github.com/PaddlePaddle/PaddleYOLO
GNU General Public License v3.0
551 stars 133 forks source link

非原来PaddleDetection的模型训练loss_box loss_cls map均为零 #11

Closed Feng1909 closed 2 years ago

Feng1909 commented 2 years ago

问题确认 Search before asking

请提出你的问题 Please ask your question

训练ppyolo ppyolov2 ppyoloe yolov3 yolox均正常,训练yolov5 yolov6mt yolov7时均存在以下问题: 无论训练多久,map以及几个Loss均为零

[08/21 23:02:22] reader WARNING: Shared memory size is less than 1G, disable shared_memory in DataLoader
W0821 23:02:22.648579 52836 gpu_resources.cc:61] Please NOTE: device: 0, GPU Compute Capability: 7.0, Driver API Version: 11.2, Runtime API Version: 10.1
W0821 23:02:22.654331 52836 gpu_resources.cc:91] device: 0, cuDNN Version: 7.6.
[08/21 23:02:34] ppdet.engine INFO: Epoch: [0] [ 0/25] learning_rate: 0.000000 loss_box: 0.000000 loss_obj: 3.835117 loss_cls: 0.000000 loss: 122.723740 eta: 21:29:05 batch_cost: 10.3127 data_cost: 7.5710 ips: 3.1030 images/s
[08/21 23:03:21] ppdet.engine INFO: Epoch: [1] [ 0/25] learning_rate: 0.000033 loss_box: 0.000000 loss_obj: 3.812787 loss_cls: 0.000000 loss: 122.009171 eta: 4:25:58 batch_cost: 2.1349 data_cost: 1.6881 ips: 14.9888 images/s
[08/21 23:04:02] ppdet.engine INFO: Epoch: [2] [ 0/25] learning_rate: 0.000133 loss_box: 0.000000 loss_obj: 3.618636 loss_cls: 0.000000 loss: 115.796364 eta: 3:50:44 batch_cost: 1.8583 data_cost: 1.4974 ips: 17.2201 images/s
[08/21 23:04:58] ppdet.engine INFO: Epoch: [3] [ 0/25] learning_rate: 0.000300 loss_box: 0.000000 loss_obj: 3.113248 loss_cls: 0.000000 loss: 99.623932 eta: 4:02:38 batch_cost: 1.9607 data_cost: 1.6221 ips: 16.3207 images/s
[08/21 23:05:44] ppdet.engine INFO: Epoch: [4] [ 0/25] learning_rate: 0.000297 loss_box: 0.000000 loss_obj: 2.070968 loss_cls: 0.000000 loss: 66.270981 eta: 3:55:00 batch_cost: 1.8214 data_cost: 1.5075 ips: 17.5694 images/s
[08/21 23:06:30] ppdet.engine INFO: Epoch: [5] [ 0/25] learning_rate: 0.000296 loss_box: 0.000000 loss_obj: 0.311432 loss_cls: 0.000000 loss: 9.965838 eta: 3:49:56 batch_cost: 1.8020 data_cost: 1.5098 ips: 17.7583 images/s
[08/21 23:07:19] ppdet.engine INFO: Epoch: [6] [ 0/25] learning_rate: 0.000295 loss_box: 0.000000 loss_obj: 0.061897 loss_cls: 0.000000 loss: 1.980716 eta: 3:50:17 batch_cost: 1.8910 data_cost: 1.5825 ips: 16.9226 images/s
[08/21 23:08:04] ppdet.engine INFO: Epoch: [7] [ 0/25] learning_rate: 0.000294 loss_box: 0.000000 loss_obj: 0.032299 loss_cls: 0.000000 loss: 1.033568 eta: 3:45:54 batch_cost: 1.7666 data_cost: 1.4625 ips: 18.1143 images/s
[08/21 23:08:59] ppdet.engine INFO: Epoch: [8] [ 0/25] learning_rate: 0.000293 loss_box: 0.000000 loss_obj: 0.023415 loss_cls: 0.000000 loss: 0.749269 eta: 3:48:07 batch_cost: 1.8442 data_cost: 1.5283 ips: 17.3521 images/s
[08/21 23:09:50] ppdet.engine INFO: Epoch: [9] [ 0/25] learning_rate: 0.000292 loss_box: 0.000000 loss_obj: 0.019228 loss_cls: 0.000000 loss: 0.615304 eta: 3:48:10 batch_cost: 1.8959 data_cost: 1.5644 ips: 16.8784 images/s
[08/21 23:10:27] ppdet.utils.checkpoint INFO: Save checkpoint: output/yolov5_s_300e_coco
[08/21 23:10:38] ppdet.engine INFO: Eval iter: 0
[08/21 23:12:30] ppdet.metrics.metrics INFO: Accumulating evaluatation results...
[08/21 23:12:30] ppdet.metrics.metrics INFO: mAP(0.50, 11point) = 0.00%
[08/21 23:12:30] ppdet.engine INFO: Total sample number: 201, averge FPS: 1.6465216304954717
[08/21 23:12:30] ppdet.engine INFO: Best test bbox ap is 0.000.
[08/21 23:12:30] ppdet.utils.checkpoint INFO: Save checkpoint: output/yolov5_s_300e_coco
[08/21 23:12:31] ppdet.engine INFO: Epoch: [10] [ 0/25] learning_rate: 0.000291 loss_box: 0.000000 loss_obj: 0.016256 loss_cls: 0.000000 loss: 0.520206 eta: 3:42:01 batch_cost: 1.7734 data_cost: 1.4456 ips: 18.0449 images/s

训练命令:

# YOLOv5
!python PaddleDetection/tools/train.py -c PaddleDetection/configs/yolov5/yolov5_s_300e_coco.yml  --use_vdl=true --eval

改动为把coco.yml的读取更改为voc.yml的读取

训练环境: AI Studio 100G V100版本 Paddle 2.3.1

nemonameless commented 2 years ago

请拉取最新代码。是voc格式你自己的数据集吧,正版voc没有这种现象,如果数据集制作没问题,建议分享下数据集分布,比如宽高比分布 anchor分布等。

Feng1909 commented 2 years ago

您好,这是我使用的数据集的一部分 https://aistudio.baidu.com/aistudio/datasetdetail/126556

Feng1909 commented 2 years ago

新版本中问题已解决