Closed madaan closed 2 years ago
The --heavy-metrics option is not working in single file mode.
--heavy-metrics
The bug can be reproduced using test data with the following command:
python run_metrics.py -r test_data/single_dataset.refs.json test_data/single_dataset.outs.json --heavy_metrics
The heavy metrics are not computed. The output is:
[I 220112 23:12:18 texts:32] Loading predictions for None [I 220112 23:12:18 texts:32] Loading references for test_data/single_dataset.refs.json [I 220112 23:12:18 __init__:124] Computing MSTTR for None... [I 220112 23:12:18 __init__:124] Computing NGramStats for None... [I 220112 23:12:18 __init__:150] Computing CHRF for None... [I 220112 23:12:18 __init__:150] Computing NIST for None... [I 220112 23:12:18 __init__:150] Computing BLEU for None... [I 220112 23:12:18 __init__:150] Computing LocalRecall for None... [I 220112 23:12:18 __init__:150] Computing ROUGE for None... { "predictions_file": null, "N": 2, "msttr-100": NaN, "msttr-100_nopunct": NaN, "total_length": 27, "mean_pred_length": 13.5, "std_pred_length": 0.5, "median_pred_length": 13.5, "min_pred_length": 13, "max_pred_length": 14, "distinct-1": 0.6296296296296297, "vocab_size-1": 17, "unique-1": 9, "entropy-1": 3.9582291686698787, "distinct-2": 0.8, "vocab_size-2": 20, "unique-2": 15, "entropy-2": 4.243856189774724, "cond_entropy-2": 0.22256268772664103, "distinct-3": 0.8695652173913043, "vocab_size-3": 20, "unique-3": 17, "entropy-3": 4.262692390839622, "cond_entropy-3": 0.010140548890983904, "total_length-nopunct": 24, "mean_pred_length-nopunct": 12.0, "std_pred_length-nopunct": 1.0, "median_pred_length-nopunct": 12.0, "min_pred_length-nopunct": 11, "max_pred_length-nopunct": 13, "distinct-1-nopunct": 0.6666666666666666, "vocab_size-1-nopunct": 16, "unique-1-nopunct": 9, "entropy-1-nopunct": 3.886842188131012, "distinct-2-nopunct": 0.8181818181818182, "vocab_size-2-nopunct": 18, "unique-2-nopunct": 14, "entropy-2-nopunct": 4.095795255000933, "cond_entropy-2-nopunct": 0.18150945892357132, "distinct-3-nopunct": 0.9, "vocab_size-3-nopunct": 18, "unique-3-nopunct": 16, "entropy-3-nopunct": 4.1219280948873624, "cond_entropy-3-nopunct": 0.012496476250064989, "references_file": "test_data/single_dataset.refs.json", "chrf": 69.41824174364662, "chrf+": 69.96223388406395, "chrf++": 66.1348715190809, "nist": 3.9977776020129183, "bleu": 50.12433, "local_recall": { "1": 0.0, "2": 0.2857142857142857, "3": 0.8888888888888888, "4": 1.0 }, "rouge1": { "precision": 0.76389, "recall": 0.7487, "fmeasure": 0.75211 }, "rouge2": { "precision": 0.51719, "recall": 0.45112, "fmeasure": 0.48053 }, "rougeL": { "precision": 0.69544, "recall": 0.71699, "fmeasure": 0.70229 }, "rougeLsum": { "precision": 0.69544, "recall": 0.71699, "fmeasure": 0.70229 } }
All the heavy metrics should be computed in addition to the lightweight metrics.
Issue
The
--heavy-metrics
option is not working in single file mode.Steps to reproduce:
The bug can be reproduced using test data with the following command:
Current behavior
The heavy metrics are not computed. The output is:
Expected behavior
All the heavy metrics should be computed in addition to the lightweight metrics.