Maluuba / nlg-eval

Evaluation code for various unsupervised automated metrics for Natural Language Generation.
http://arxiv.org/abs/1706.09799
Other
1.35k stars 224 forks source link

Incorrect output when setting the metrics_to_omit parameter #45

Closed Runingtime closed 6 years ago

Runingtime commented 6 years ago

Hi, it seems that there is a bug in the load_scorers method of the NLGEval class. For example, when running the following code,

from nlgeval import NLGEval

nlgeval = NLGEval(no_skipthoughts=True, no_glove=True,  metrics_to_omit=['Bleu_1', 'Bleu_2', 'Bleu_3'])  # loads the models
 metrics_dict = nlgeval.compute_metrics([references], hypothesis)
 print(metrics_dict)

it gives the wrong results (Bleu_4 isn't printed):

{'METEOR': 0.2191196041010623, 'ROUGE_L': 0.46546672221759094, 'CIDEr': 3.10829766113145}

So, is this a real bug or did I miss something?

juharris commented 6 years ago

Yep that's intended as per the docs: "Metrics to omit. Omitting Bleu{i} will omit Bleu{j} for j>=i." See here: https://github.com/Maluuba/nlg-eval/blob/master/nlgeval/__init__.py#L161

juharris commented 6 years ago

I wouldn't mind changing the behavior but this feature is mainly meant as a performance gain but the way a lot of the BLEU calculations are done means that when calculating Bleu_4, you also compute the others. So there isn't much saved in returning Bleu_4 but not the others the way the code is currently written.