bupt-ai-cz / LLVIP

LLVIP: A Visible-infrared Paired Dataset for Low-light Vision
610 stars 65 forks source link

AP or MR #12

Open XiongZhongxia opened 2 years ago

XiongZhongxia commented 2 years ago

Thanks for your contribution!

  1. I found this benchmark when reading paper CFT (as mentioned in this issue #11),now I see that your current baseline of YOLOV5 with infrared data maintains 67.0 AP, which is much better than CFT (63.6). However, for the Log Average Miss Rate, results for your baseline and CFT are 10.66 VS 5.40. I wonder which metric is more reasonable to demonstrate the power of model?
  2. I also noticed that some followers asked about the poor performance of image fusion method for pedestrian detection (like CFT), which is even worse than detectors using single modal. Have you extended your baseline YOLOV5 models with both RGB and thermal inputs? How's the results comparing with the original baseline? Thanks.
XiongZhongxia commented 2 years ago

Actually, I also implement a light-weight model (which nearly has 10% FLOPS of your baseline), with multispectral inputs. Finally it only gets 62.7 AP but achieves 4.3 MR. Is this reasonable?

SantJay commented 2 years ago
  1. A very interesting and valuable question. The following is my superficial understanding of this issue, there may be something wrong, and corrections are welcome.

    According to section 3.1 of the paper Pedestrian Detection: An Evaluation of the State of the Art: 'We use the log-average miss rate to summarize detector performance, computed by averaging miss rate at nine FPPI rates evenly spaced in log-space in the range 10^(-2) to 10^0', the metric log average miss rate measures the recall ability of the model: the lower the metric, the stronger the recall ability. However, the metric AP is calculated from the precision-recall curve, it measures the precision of the model. In fact, we hope the model to have both high precision and high recall, but in most cases, precision and recall are negatively correlated. This may be the reason why some low AP models report lower MR(higher recall).

    It seems that most pedestrian detection tasks tend to use MR as a metric: https://paperswithcode.com/task/pedestrian-detection

  2. Sorry, we have not tried experiments using fused images for object detection.
XiongZhongxia commented 2 years ago
  1. A very interesting and valuable question. The following is my superficial understanding of this issue, there may be something wrong, and corrections are welcome. According to section 3.1 of the paper Pedestrian Detection: An Evaluation of the State of the Art: 'We use the log-average miss rate to summarize detector performance, computed by averaging miss rate at nine FPPI rates evenly spaced in log-space in the range 10^(-2) to 10^0', the metric log average miss rate measures the recall ability of the model: the lower the metric, the stronger the recall ability. However, the metric AP is calculated from the precision-recall curve, it measures the precision of the model. In fact, we hope the model to have both high precision and high recall, but in most cases, precision and recall are negatively correlated. This may be the reason why some low AP models report lower MR(higher recall). It seems that most pedestrian detection tasks tend to use MR as a metric: https://paperswithcode.com/task/pedestrian-detection

    1. Sorry, we have not tried experiments using fused images for object detection.

Thanks for your inspring answer. Here's a another discussion about AP and MR: https://patrick-llgc.github.io/Learning-Deep-Learning/paper_notes/ap_mr.html. According to your explanation and this discussion, it may be preferable to demonstrate results of both metrics, luckily you have provided them for subsequent researches.