Open Gary-code opened 5 days ago
scoring.py
has the details you want about the evaluation metrics😁
The script for BLEU-4 evaluation:
import evaluate
bleu = evaluate.load('evaluate/metrics/bleu')
bleu4 = bleu.compute(predictions=[hyp], references=[ref])['bleu']
Congratulations on having your paper accepted to ACL 2024! I have a question regarding the evaluation metrics in your work. I noticed that the BLEU-4 scores reported for all models are quite high. I was curious to know which script or tool you used for evaluation.