Open zhaoxh16 opened 1 year ago
Hi, I think it's because the line max_length=4096//len(all_docs)
set the max length of output summary based on your batch size.
For example , 4096 // 4 = 1024 4096 // 8 = 512 etc.
Therefore, different batch sizes can cause slightly different output summary (with different lengths) and ROUGE is sensitive to summary length, causing minor differences between your result and reported scores.
Hope this helps.
Sorry, I don't quite understand what you mean. I think that the line max_length=4096//len(all_docs)
set the max length of input documents based on document number for each piece of data, which is not relevant to batch size and summary length.
Hi, thank you for your sharing. I got troubles on inferencing multinews datasets. I followed the code in Evaluation_Example.ipynb with "use_stemmers=True"to test on multinews test set but got ROUGE scores mid rouge1 fmeasure=49.87, mid rouge2 fmeasure=20.61, mid rouge-L fmeasure=25.59, which is lower than your result. Could you please tell me how to solve the problem? Thank you.
Here is my code.
And the result is