PaddlePaddle / PaddleDetection

Object Detection toolkit based on PaddlePaddle. It supports object detection, instance segmentation, multiple object tracking and real-time multi-person keypoint detection.
Apache License 2.0
12.67k stars 2.87k forks source link

开启训练中评估后,计算mertic时出错导致程序中断 #8526

Open flytocc opened 1 year ago

flytocc commented 1 year ago

问题确认 Search before asking

Bug组件 Bug Component

Training, Validation

Bug描述 Describe the Bug

相似问题:https://github.com/PaddlePaddle/PaddleDetection/issues/1316

问题描述

在训练时,开启了eval选项。当第一个epoch训练完成后进行eval时,在计算mertic地方报错。报错信息似乎表明jsonfile的格式有问题,但是:

  1. jsonfile是在代码运行过程自动生成的(用来保存中间结果,以便后续计算mertic),和数据集无关;
  2. 我单独测试了一下jsonfile文件,能被json.load正常加载。

复现方法

python -m paddle.distributed.launch \
  tools/train.py \
  -c=configs/deformable_detr/deformable_detr_r50_1x_coco.yml \
  --eval

报错内容如下:

[08/10 19:54:32] ppdet.engine INFO: Eval iter: 0
[08/10 19:54:45] ppdet.engine INFO: Eval iter: 100
[08/10 19:54:58] ppdet.engine INFO: Eval iter: 200

...

[08/10 20:04:53] ppdet.engine INFO: Eval iter: 4800
[08/10 20:05:07] ppdet.engine INFO: Eval iter: 4900
[08/10 20:05:28] ppdet.metrics.metrics INFO: The bbox result is saved to bbox.json.
loading annotations into memory...
Done (t=0.45s)
creating index...
index created!
[08/10 20:05:29] ppdet.metrics.coco_utils INFO: Start evaluate...
Loading and preparing results...
Traceback (most recent call last):
  File "tools/eval.py", line 208, in <module>
    main()
  File "tools/eval.py", line 204, in main
    run(FLAGS, cfg)
  File "tools/eval.py", line 159, in run
    trainer.evaluate()
  File "/root/paddlejob/workspace/code/PaddleDetection/ppdet/engine/trainer.py", line 707, in evaluate
    self._eval_with_loader(self.loader)
  File "/root/paddlejob/workspace/code/PaddleDetection/ppdet/engine/trainer.py", line 690, in _eval_with_loader
    metric.accumulate()
  File "/root/paddlejob/workspace/code/PaddleDetection/ppdet/metrics/metrics.py", line 142, in accumulate
    classwise=self.classwise)
  File "/root/paddlejob/workspace/code/PaddleDetection/ppdet/metrics/coco_utils.py", line 107, in cocoapi_eval
    coco_dt = coco_gt.loadRes(jsonfile)
  File "/usr/local/lib/python3.7/dist-packages/pycocotools/coco.py", line 320, in loadRes
    anns = json.load(f)
  File "/usr/lib/python3.7/json/__init__.py", line 296, in load
    parse_constant=parse_constant, object_pairs_hook=object_pairs_hook, **kw)
  File "/usr/lib/python3.7/json/__init__.py", line 348, in loads
    return _default_decoder.decode(s)
  File "/usr/lib/python3.7/json/decoder.py", line 337, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "/usr/lib/python3.7/json/decoder.py", line 353, in raw_decode
    obj, end = self.scan_once(s, idx)
json.decoder.JSONDecodeError: Expecting ',' delimiter: line 1 column 3312669 (char 3312668)

复现环境 Environment

Bug描述确认 Bug description confirmation

是否愿意提交PR? Are you willing to submit a PR?

lyuwenyu commented 1 year ago

是不是同时跑了多个程序?

flytocc commented 1 year ago

是不是同时跑了多个程序?

没有,只用了paddle.distributed.launch,4 GPUs

flytocc commented 1 year ago

PaddleDetection似乎不支持多卡eval?

lyuwenyu commented 1 year ago

是的 不支持多卡评估,,有兴趣也可以加一下

flytocc commented 1 year ago

是的 不支持多卡评估,,有兴趣也可以加一下

我目前的解决方法是,在保存json文件时,根据rank来命名,避免重复写入。 至于多卡评估,我后续再重新提一个issue