atcbosselut / comet-commonsense

Code for ACL 2019 Paper: "COMET: Commonsense Transformers for Automatic Knowledge Graph Construction" https://arxiv.org/abs/1906.05317
Apache License 2.0
669 stars 126 forks source link

Atomic Evaluation #6

Closed guoday closed 5 years ago

guoday commented 5 years ago

For atomic evaluation, the evaluation doesn't have bleu-2 score. Will you update the script to calculate the generation by bleu-2 score?

guoday commented 5 years ago

Following the atomic paper "we compute the average BLEU score (n = 2, Smoothing1; Chen and Cherry, 2014) between each sequence in the top 10 predictions and the corresponding set of MTurk annotations", I try to evaluate the bleu-2 score, but I only get 12.2 The score is lower than your reported numbers in the paper. And I don't know what I misunderstand. Can you help?

atcbosselut commented 5 years ago

Added a BLEU-2 evaluation script in scripts/evaluate/bleu_atomic.py