cannot reproduce the F1-score reported in the paper.

calx-git commented 7 months ago

I have run the inference script and tried to reproduce the results reported in the paper. However, after I tried the checkpoints of PGL-Sum and VASNet, I found that the output results were not consistent with the ones reported in the paper. The results I got after running the inference script are:

-----------------------------------------------------------
TEST RESULT on ckpts/vasnet/vasnet1_best_f1.pkl:
TEST MRHiSum F-score 40.968 | MAP50 0.58994 | MAP15 0.25663
-----------------------------------------------------------

-----------------------------------------------------------
TEST RESULT on ckpts/pgl_sum/pgl_sum3_best_f1.pkl:                                                                                                                                 
TEST MRHiSum F-score 41.527 | MAP50 0.6173 | MAP15 0.27549
-----------------------------------------------------------

While the results reported in the paper are: PGL-SUM 55.89 ±0.04 (F1-Score) 61.60 ±0.14 (mAP-50%) 27.45 ±0.15 (mAP-15%) VASNet 55.26 ±0.05 (F1-Score) 58.69 ±0.30 (mAP-50%) 25.28 ±0.40 (mAP-15%)

Would you please help check what the problem is?

JinhwanSul commented 3 weeks ago

Hello,

I cannot analyze the exact problem based on your issue report. However, our codes and checkpoints have no problem in our computer.

Some of the potential problems can be:

Did you download all of the yt8m dataset following our script?
Could you double check the F1-score evaluation part of your code? Because your F1-score is the only thing different from our result.

Thanks,

Jinhwan

Yu1stra commented 5 days ago

I have the same problem, if you find out the problem, please let me know, thank you!!!

MRHiSum / MR.HiSum

cannot reproduce the F1-score reported in the paper. #1