Robert-xiaoqiang / SkillQG

This repository is the official implementation of the ACL 2023 paper (Findings): SkillQG: Learning to Generate Question for Reading Comprehension Assessment
6 stars 0 forks source link

questions about evalpackage #2

Open DLiquor opened 1 year ago

DLiquor commented 1 year ago

i have tried to run a simple baseline with bart-base, but i got a higher bleu-4 score with the package "nlg-eval". i am wondering whethere i used the wrong package since your directory about the metric is None

DLiquor commented 1 year ago

my bleu-4 score is around 27

XDeepAzure commented 1 year ago

我也遇见了这个问题,同问? @Robert-xiaoqiang

Robert-xiaoqiang commented 1 year ago

Hi, all Actually, I employ the pycocoevalcap repo as the evaluation script to assess the syntactic quality of generation, which seems to have the same implementation of BLEU-4 and ROUGE-L as "nlg-eval". The key difference may lie in the usage of this script, I leverge the function "compute_individual_metrics" to compute segment-level quality and average it on the test split. I hope that my answer is helpful to you.

DLiquor commented 1 year ago

thank you very much!!