为什么训练时map、r、p都不为0，但是运行test.py时就全都为0？

WongKinYiu / PyTorch_YOLOv4

PyTorch implementation of YOLOv4

1.88k stars 585 forks source link

为什么训练时map、r、p都不为0，但是运行test.py时就全都为0？ #396

Open whuabin opened 2 years ago

whuabin commented 2 years ago

训练时： Snipaste_2022-03-23_15-52-19

测试时： Snipaste_2022-03-23_15-50-44 这是什么原因？

whuabin commented 2 years ago

我使用的是master分支的代码

xiaohenghuang commented 2 years ago

你的Targets是0。

whuabin commented 2 years ago

为什么训练的时候不是0但是test的时候是0啊.. test的图片和标注文件都有的

xiaohenghuang commented 2 years ago

训练的是val，和test有区别。

你是用的test.py做“测试”吗

whuabin commented 2 years ago

哦哦我说错了应该说运行test.py的时候计算出来的targets和p、r、map都是0 刚刚我发现好像是模型返回的东西有问题

whuabin commented 2 years ago

请问你知道这是啥问题吗

whuabin commented 2 years ago

我在想是不是pytorch版本问题问一下你的版本都是啥啊？

xiaohenghuang commented 2 years ago

可是你为什么要运行test.py呢？你要是看code，test.py是会被train.py调用的。你做inference也是用detect.py。

whuabin commented 2 years ago

不是我就是想用训练好的模型来计算map、p这类评价指标

Joker9194 commented 2 years ago

你的Targets是0，看看你测试集的标注是否对应上

LiuXiaoYu2030 commented 2 years ago

did you solve this problem

Joker9194 commented 2 years ago

or you need check the .cfg, I use the worse cfg file also face the problem.

whuabin commented 2 years ago

did you solve this problem

还没有解决

whuabin commented 2 years ago

你的Targets是0，看看你测试集的标注是否对应上

应该对应上了啊因为我数据集是直接划分成训练集和测试集的，既然训练集能够显示出targets数量，那测试集应该也没问题啊而且我使用yolov5来训练过可以显示target数量和p、r、map等值

whuabin commented 2 years ago

我使用u版 v3 和v5都没出现过这个问题

Joker9194 commented 2 years ago

你的Targets是0，看看你测试集的标注是否对应上

应该对应上了啊因为我数据集是直接划分成训练集和测试集的，既然训练集能够显示出targets数量，那测试集应该也没问题啊而且我使用yolov5来训练过可以显示target数量和p、r、map等值

我自己没有遇到过这个问题，你可以在这两行调试一下看看：

https://github.com/WongKinYiu/PyTorch_YOLOv4/blob/eb5f1663ed0743660b8aa749a43f35f505baa325/test.py#L100-L101

看看path的值以及dataloader的情况

ZeWu0307 commented 2 years ago

请问解决了吗

Joker9194 commented 2 years ago

我今天也出现了这个问题，问题的主要原因是我使用test的时候，cfg文件和我训练时的cfg文件不一致导致的，看看是不是因为这个问题。

maidouxiaozi commented 2 years ago

我也是这种结果，然后定位到模型推理的时候，结果为nan

maidouxiaozi commented 2 years ago

你们有发现和解决这个问题吗?

whuabin commented 2 years ago

你们有发现和解决这个问题吗?

我发现了啊但是没有解决你试试上面说的方法吧

maidouxiaozi commented 2 years ago

@whuabin 请问您现在解决了吗

ISLab-Eden commented 1 year ago

Perhaps skipping the learning step in the train.py may be useful. I try to input the correct model, dataset, and the specific weight (such as best.pt) to the learning-step-skipped code, and get the same P, R, mAP@.5, mAP@.5:.95 values from the output result. As a result, one of the reason why the output value from test.py is not the same as the output in training steps is that the whole model is not the same from both testing (like hyperparameter or another variable in model).

lqin0818 commented 1 year ago

我也遇到同样的问题，训练时val的p,r,map都是正常的，单独运行test时，同样的验证集，结果全为0.原因可能是这样，train时我把半精度关掉了，因为我的破卡不支持，会出错。所以我把test里面的半精度也关掉了，然后就不是0了，但是得出的结果很差。