bheinzerling / pyrouge

A Python wrapper for the ROUGE summarization evaluation package
MIT License
250 stars 71 forks source link

Is this result normal? #23

Open huyingxi opened 6 years ago

huyingxi commented 6 years ago

Recently I was doing a text summarization related research and trained a simple model. I want to use ROUGE to check the validity of the model, and I got the following results.

1 ROUGE-1 Average_R: 0.41775 1 ROUGE-1 Average_P: 0.39336 1 ROUGE-1 Average_F: 0.39289

1 ROUGE-2 Average_R: 0.18253 1 ROUGE-2 Average_P: 0.17314 1 ROUGE-2 Average_F: 0.17203

1 ROUGE-3 Average_R: 0.10546 1 ROUGE-3 Average_P: 0.10178 1 ROUGE-3 Average_F: 0.10011

1 ROUGE-4 Average_R: 0.07039 1 ROUGE-4 Average_P: 0.06904 1 ROUGE-4 Average_F: 0.06724 ...

And it shows that ROUGE_F score is smaller than ROUGE_P and ROUGE_R? Does anyone know why? Is this result normal?