danieldeutsch / sacrerouge

SacreROUGE is a library dedicated to the use and development of text generation evaluation metrics with an emphasis on summarization.
Apache License 2.0
134 stars 11 forks source link

PythonRouge does not have a scoring option #32

Open danieldeutsch opened 4 years ago

danieldeutsch commented 4 years ago

The original ROUGE script allows for a scoring option: -f A|B where A means to average over the models and B takes the maximum. Similar functionality should be implemented for PythonRouge. The logic should be identical to ROUGE, so we need to understand the implementation details (how does it compute the "best" model? Is it per metric or does it pick one of the metrics and use that for precision, recall, and f1?)