Questions about mAP scores on the FLIR Dataset

zjh21 commented 1 year ago

📚 Documentation Issue

This issue category is for problems about existing documentation, not for asking how-to questions. Thank you for your great work. Still, there exist some issues that I am concerned about, especially on the FLIR Dataset. 1) It is mentioned that 'our ProbEn increases AP from prior art 74.6% to 84.4%!' next to Table 4. However, data in the tables indicate that the performance of ProbEn is 83.76 on FLIR. 2) It is mentioned that, on FLIR, 'Compared to the single-modal detector (Thermal), our learning-based early- fusion (EarlyFusion) and mid-fusion (MidFusion) produce better performance.' In Table 3, however, Early Fusion has 78.8 mAP while Thermal has 79.24 mAP. In Table 4, Early Fusion has higher mAP on each of the three categories yet lower mAP on 'all', which is confusing. 3) With due respect, I'd like to point out that methods like CFR and GAFF are trained and tested on FLIR_align Dataset that is provided by the CFR paper rather than the original FLIR. Although the original FLIR might be a more difficult dataset, it is not that suitable to take mAP scores of CFR and GAFF for direct comparision with ProbEn.

Provide a link to an existing documentation/comment/tutorial: ProbEn: https://arxiv.org/pdf/2104.02904v3.pdf CFR: https://arxiv.org/pdf/2009.12664v1.pdf GAFF: https://openaccess.thecvf.com/content/WACV2021/papers/Zhang_Guided_Attentive_Feature_Fusion_for_Multispectral_Pedestrian_Detection_WACV_2021_paper.pdf
How should the above documentation/comment/tutorial improve: Thank you very much if you can account for the first and second issues.

Hiram1026 commented 1 year ago

同样的问题

Hiram1026 commented 1 year ago

@zjh21 您好，希望与您联系一下，这是我的邮箱hiram@std.uestc.edu.cn

Jamie725 commented 1 year ago

Thanks for pointing them out! I am sorry for the confusion. Here are some replies to your questions:

In the paper: "our ProbEn increases AP from prior art 74.6% to 84.4%". 84.4% was the number before we add Gaussian negative log-likelihood loss into the model training. That is, we trained two different versions of models. One version of model is without Gaussian negative log-likelihood loss in the detector, which has the performance of 84.4% mAP. The other version of the model is with Gaussian negative log-likelihood loss in the detector, which has the performance of 83.76%. We wrote our first version of paper without Gaussian negative log-likelihood(GNLL) loss so it says 84.4%, and we decided to add GNLL loss into detectors and forgot to change the numbers in the caption. Thanks for pointing it out, we'll update the arxiv version with the correct number.
I am sorry for the confusion again. Before we add GNLL, thermal only was worse than early fusion, so we wrote the sentence. After adding GNLL, thermal only performs better than early fusion and we didn't find out the discrepancy in the text in our later submission. We'll fix those parts in the arxiv version, thanks!
I'll double-check and fix those parts in the paper if needed, thanks!

Thank you very much again for pointing out these points, we'll fix them soon, thanks!

Jamie725 / Multimodal-Object-Detection-via-Probabilistic-Ensembling

Questions about mAP scores on the FLIR Dataset #10

📚 Documentation Issue