Closed pfliu-nlp closed 2 years ago
针对用户的每次evaluation,我们可以定义以下信息:
{ "metrics":[], "analysis":"" "confidence_interval":"" }
https://github.com/mjpost/sacrebleu#json-output
{ "name": "BLEU", "score": 20.8, "signature": "nrefs:1|case:mixed|eff:no|tok:13a|smooth:exp|version:2.0.0", "verbose_score": "54.4/26.6/14.9/8.7 (BP = 1.000 ratio = 1.026 hyp_len = 62880 ref_len = 61287)", "nrefs": "1", "case": "mixed", "eff": "no", "tok": "13a", "smooth": "exp", "version": "2.0.0" }
Our systems are evaluated by SacreROUGE version 2.0, with XX, YY, ZZ. We use XX to calculate the confidence interval for each system.
也可以参考sacreblue: https://github.com/mjpost/sacrebleu#version-signatures
针对用户的每次evaluation,我们可以定义以下信息:
1. 定义一个json结构(configuration),这个json的结构可以是
sacrebleu里面有个类似的,我们可以参考:
https://github.com/mjpost/sacrebleu#json-output
2. 我们自动生成一个介绍evaluation setting的描述, 比如
Our systems are evaluated by SacreROUGE version 2.0, with XX, YY, ZZ. We use XX to calculate the confidence interval for each system.
也可以参考sacreblue: https://github.com/mjpost/sacrebleu#version-signatures