Why is there a big gap between the evaluation result of ROUGE and the paper in the single document summary

yangmuli78 commented 1 year ago

          My ROUGE installation should be fine as I have no problem with the CNN/DailyMail dataset at all, but the ROUGE score on the Multi-News dataset is: Rouge1 =40.4, RougE2 =15.7, Rougel =35.5

------------------ 原始邮件 ------------------ 发件人: "Danqing @.>; 发送时间: 2022年8月17日(星期三) 下午4:10 收件人: @.>; 抄送: @.>; @.>; 主题: Re: [dqwang122/HeterSumGraph] Question about R1, R2, RL score (Issue #32)

Yes, I get a ROUGE score on the published output and a 6% difference on the multipurpose news dataset from the data listed by the author

What does "multipurpose news dataset" refer to? Is it the multi-news? What is the exact "a ROUGE score"? Is it R1 40.4? If you cannot get the reported scores (R1 46.05) from the released outputs, you had better check the installation of ROUGE. You can follow the instruction here(https://github.com/dqwang122/HeterSumGraph#rouge-installation). Besides, you should also recheck the data format and preprocessing.

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.Message ID: @.***>

Originally posted by @suwu-suwu in https://github.com/dqwang122/HeterSumGraph/issues/32#issuecomment-1217666084

dqwang122 commented 1 year ago

What ROUGE score did you get in CNN/DM? The issue you mentioned above has no problem with the single document summary (CNN/DM) dataset.

If you cannot get the results reported in the paper, please check the following things:

Download the output files provided in the link of README, and then use utils.pyrouge_score_all() function to compare the outputs and the ground-truth. If you cannot get the same ROUGE score, then there is something wrong in your ROUGE installment. Please check this section (https://github.com/dqwang122/HeterSumGraph#rouge-installation)
If you can get the same ROUGE from our released outputs, then try to reproduce these outputs based on the checkpoint we released. If fails, then there is something wrong with the command you used for evaluation or the hyperparameters you set, please double check these with README and our paper.
If you successfully pass the previous two steps but have problem in training from scratch, then check the training process and the data preprocess. You can also use the preprocessed data we provided to figure out whether the problems lie in the data.

yangmuli78 commented 1 year ago

Thank you for your answer. Could you provide me with a training dataset containing a "summary" key? Thank you very much!

yangmuli78 commented 1 year ago

My email address is“ ncga_yangwei@163.com ”. If possible, send the train dataset to this mailbox, thank you!

yangmuli78 commented 1 year ago

作者也就是说，您发布的checkpoins，我将测试数据输入运行得到的结果是经过utils.pyrouge_score_all() 得到的ROUGE指标才能和paper上的结果一致，而不是通过rouge.get_scores()，请问是这个意思吗？谢谢您的解答

dqwang122 / HeterSumGraph

Why is there a big gap between the evaluation result of ROUGE and the paper in the single document summary #38