Closed guoday closed 5 years ago
Following the atomic paper "we compute the average BLEU score (n = 2, Smoothing1; Chen and Cherry, 2014) between each sequence in the top 10 predictions and the corresponding set of MTurk annotations", I try to evaluate the bleu-2 score, but I only get 12.2 The score is lower than your reported numbers in the paper. And I don't know what I misunderstand. Can you help?
Added a BLEU-2 evaluation script in scripts/evaluate/bleu_atomic.py
For atomic evaluation, the evaluation doesn't have bleu-2 score. Will you update the script to calculate the generation by bleu-2 score?