PaddlePaddle / PaddleDetection

Object Detection toolkit based on PaddlePaddle. It supports object detection, instance segmentation, multiple object tracking and real-time multi-person keypoint detection.
Apache License 2.0
12.83k stars 2.89k forks source link

这个环境下,训练和验证和推理 都有问题 #9177

Closed sgAlbanC closed 1 month ago

sgAlbanC commented 1 month ago

问题确认 Search before asking

请提出你的问题 Please ask your question

W1017 22:44:10.468240 7276 gpu_resources.cc:61] Please NOTE: device: 0, GPU Compute Capability: 8.7, Driver API Version: 11.4, Runtime API Version: 11.4 W1017 22:44:10.475713 7276 gpu_resources.cc:91] device: 0, cuDNN Version: 8.6.

paddlepaddle-gpu 2.4.1 python3.8

训练/验证/推理 步骤没问题,但结果有问题。

这是训练的:我训练使用的是官方提供的roadsign_voc数据集和配置文件。我在其他地方可以正常训练和使用,但这个环境下有问题。真的是环境的问题吗?

[10/17 22:38:25] ppdet.engine INFO: Epoch: [0] [ 0/87] learning_rate: 0.000033 loss_xy: 0.000000 loss_w h: 0.000000 loss_obj: 0.000000 loss_cls: 1.836784 loss: 0.245663 eta: 4:05:58 batch_cost: 4.2410 data_c ost: 0.0006 ips: 1.8864 images/s [10/17 22:38:42] ppdet.engine INFO: Epoch: [0] [20/87] learning_rate: 0.000047 loss_xy: 0.000000 loss_w h: 0.000000 loss_obj: 0.000000 loss_cls: 0.000000 loss: 0.000000 eta: 0:55:40 batch_cost: 0.8018 data_c ost: 0.0004 ips: 9.9779 images/s [10/17 22:38:54] ppdet.engine INFO: Epoch: [0] [40/87] learning_rate: 0.000060 loss_xy: 0.000000 loss_w h: 0.000000 loss_obj: 0.000000 loss_cls: 0.000000 loss: nan eta: 0:46:00 batch_cost: 0.6312 data_cost: 0.0004 ips: 12.6737 images/s [10/17 22:39:09] ppdet.engine INFO: Epoch: [0] [60/87] learning_rate: 0.000073 loss_xy: nan loss_wh: na n loss_obj: nan loss_cls: nan loss: nan eta: 0:44:20 batch_cost: 0.7276 data_cost: 0.0004 ips: 10.9949 images/s [10/17 22:39:26] ppdet.engine INFO: Epoch: [0] [80/87] learning_rate: 0.000087 loss_xy: nan loss_wh: 0. 000000 loss_obj: 0.000000 loss_cls: nan loss: nan eta: 0:44:56 batch_cost: 0.8389 data_cost: 0.0004 ips : 9.5359 images/s [10/17 22:39:32] ppdet.utils.checkpoint INFO: Save checkpoint: output/yolov3_mobilenet_v1_roadsign [10/17 22:39:32] ppdet.engine INFO: Eval iter: 0 [10/17 22:39:38] ppdet.engine INFO: Eval iter: 100 [10/17 22:39:42] ppdet.metrics.metrics INFO: Accumulating evaluatation results... [10/17 22:39:42] ppdet.metrics.metrics INFO: mAP(0.50, integral) = 0.00% [10/17 22:39:42] ppdet.engine INFO: Total sample number: 176, averge FPS: 16.379583372727378 [10/17 22:39:42] ppdet.engine INFO: Best test bbox ap is 0.000.

sgAlbanC commented 1 month ago

训练和验证精度全为0,推理的时候,confidence都超过几百了,有很多无效的框。

sgAlbanC commented 1 month ago

PaddleDetection是2.5的 paddledet是2.5

sgAlbanC commented 1 month ago

paddlepaddle_gpu-2.4.1-cp38-cp38-linux_aarch64.whl 我是在jetson安装的这个

sgAlbanC commented 1 month ago

1729222965988 同样的,我是用paddleclas套件,也是出现同样的情况。会不会是paddlepaddle_gpu-2.4.1-cp38-cp38-linux_aarch64.whl 的原因。我试过用paddlepaddle cpu版本推理,是没有问题的。

sgAlbanC commented 1 month ago

paddlepaddle_gpu-2.4.1-cp38-cp38-linux_aarch64.whl 这个东西我是在 https://forums.developer.nvidia.com/t/paddlepaddle-for-jetson/242765 这上面下载的。

liu-jiaxuan commented 1 month ago

可以试一下最新环境下能否运行成功,请参考教程

sgAlbanC commented 1 month ago

可以试一下最新环境下能否运行成功,请参考教程

为什么要用paddlex? 我直接用的PaddleDetection+paddlepaddle-gpu. cpu可以正常训练和推理,但gpu不对,你这边有其他的paddlepaddle_gpu的whl吗?

sgAlbanC commented 1 month ago

没问题了,我在网上重新找了个2.5的whl.安装起来,然后就可以正常运行检测了。