PPYOLOE的_bbox_loss训练自己的数据集时计算损失报错ValueError: Target -6 is out of lower bound

YJH1108 commented 1 month ago

问题确认 Search before asking

[X] 我已经查询历史issue，没有发现相似的bug。I have searched the issues and found no similar bug report.

Bug组件 Bug Component

Training

Bug描述 Describe the Bug

在使用PPYOLOE训练自己的数据集时计算bbox_loss时出现以下错误 “”“ Traceback (most recent call last): File ".\tools\train.py", line 211, in main() File ".\tools\train.py", line 207, in main run(FLAGS, cfg) File ".\tools\train.py", line 160, in run trainer.train(FLAGS.eval) File "E:\jingsai\PaddleDetection\ppdet\engine\trainer.py", line 577, in train outputs = model(data) File "E:\jingsai\PaddleDetection\venv_pd\lib\site-packages\paddle\fluid\dygraph\layers.py", line 930, in call return self._dygraph_call_func(*inputs, kwargs) File "E:\jingsai\PaddleDetection\venv_pd\lib\site-packages\paddle\fluid\dygraph\layers.py", line 915, in _dygraph_call_func outputs = self.forward(*inputs, *kwargs) File "E:\jingsai\PaddleDetection\ppdet\modeling\architectures\meta_arch.py", line 60, in forward out = self.get_loss() File "E:\jingsai\PaddleDetection\ppdet\modeling\architectures\yolo.py", line 147, in get_loss return self._forward() File "E:\jingsai\PaddleDetection\ppdet\modeling\architectures\yolo.py", line 93, in _forward yolo_losses = self.yolo_head(neck_feats, self.inputs) File "E:\jingsai\PaddleDetection\venv_pd\lib\site-packages\paddle\fluid\dygraph\layers.py", line 930, in call return self._dygraph_call_func(inputs, kwargs) File "E:\jingsai\PaddleDetection\venv_pd\lib\site-packages\paddle\fluid\dygraph\layers.py", line 915, in _dygraph_call_func outputs = self.forward(*inputs, **kwargs) File "E:\jingsai\PaddleDetection\ppdet\modeling\heads\ppyoloe_head.py", line 264, in forward return self.forward_train(feats, targets, aux_pred) File "E:\jingsai\PaddleDetection\ppdet\modeling\heads\ppyoloe_head.py", line 198, in forward_train return self.get_loss([ File "E:\jingsai\PaddleDetection\ppdet\modeling\heads\ppyoloe_head.py", line 455, in get_loss assign_out_dict = self.get_loss_from_assign( File "E:\jingsai\PaddleDetection\ppdet\modeling\heads\ppyoloe_head.py", line 500, in get_loss_from_assign self._bbox_loss(pred_distri, pred_bboxes, anchor_points_s, File "E:\jingsai\PaddleDetection\ppdet\modeling\heads\ppyoloe_head.py", line 364, in _bbox_loss loss_dfl = self._df_loss(pred_dist_pos, assigned_ltrb_pos, File "E:\jingsai\PaddleDetection\ppdet\modeling\heads\ppyoloe_head.py", line 319, in _df_loss loss_left = F.cross_entropy( File "E:\jingsai\PaddleDetection\venv_pd\lib\site-packages\paddle\nn\functional\loss.py", line 1719, in cross_entropy raise ValueError("Target {} is out of lower bound.".format( ValueError: Target -1 is out of lower bound. ”“”

出错的行是 “ ppyoloe_head.py中的 loss_dfl = self._df_loss(pred_dist_pos, assigned_ltrb_pos, self.reg_range[0]) * bbox_weight ”

我尝试打印了pred_dist_pos和assigned_ltrb_pos两个变量，发现assigned_ltrb_pos经常出现较大的值

不清楚是bug还是我在训练自己的数据集时缺少设置什么参数 pred_dist_pos和assigned_ltrb_pos又是在描述什么呢？

望解答

复现环境 Environment

nothing

Bug描述确认 Bug description confirmation

[X] 我确认已经提供了Bug复现步骤、代码改动说明、以及环境信息，确认问题是可以复现的。I confirm that the bug replication steps, code change instructions, and environment information have been provided, and the problem can be reproduced.

是否愿意提交PR？ Are you willing to submit a PR?

[X] 我愿意提交PR！I'd like to help by submitting a PR!

YJH1108 commented 1 month ago

我尝试进一步debug发现，assigned_ltrb值域是正常的，在reg_range的范围之内（默认0~17），但是为什么经过masked_select之后会出现值域之外的值，例如下图中assigned_ltrb_pos出现了28,60,92.......或者负数值

我对mask_select的理解是只会根据mask从原tensor中取值，不知道我是否理解有误

YJH1108 commented 1 month ago

在CPU版本下masked_select能正常得到结果我对环境是： paddlepaddle-gpu 2.3.2 CUDA11.2 cudnn 8.2

code: """ import paddle

print(paddle.version) x = paddle.randn((10,)) mask = x >= 0 y = paddle.masked_select(x, mask) print(x) print(mask) print(y) """ Snipaste_2024-05-10_11-04-08

lyuwenyu commented 1 month ago

gpu是什么版本的

YJH1108 commented 1 month ago

gpu是什么版本的

3050Ti ，驱动版本546.80

安装paddlepaddle-cpu使用的是： python -m pip install paddlepaddle==2.3.2 -i https://mirror.baidu.com/pypi/simple

安装paddlepaddle-gpu 2.3使用的是： python -m pip install paddlepaddle-gpu==2.3.2.post112 -f https://www.paddlepaddle.org.cn/whl/windows/mkl/avx/stable.html

后面我发现使用pd2.6时没有这个问题安装paddlepaddle-gpu 2.6： python -m pip install paddlepaddle-gpu==2.6.1.post112 -f https://www.paddlepaddle.org.cn/whl/windows/mkl/avx/stable.html

但是我现在参加一个比赛最高只能使用2.3

lyuwenyu commented 1 month ago

这应该是之前的paddle有bug 后面的版本修复的，，试一下dfl那个区间改成 [0-17]

PaddlePaddle / PaddleDetection