AIPHES / emnlp19-moverscore

MoverScore: Text Generation Evaluating with Contextualized Embeddings and Earth Mover Distance
MIT License
192 stars 31 forks source link

Getting same results for n_gram=1,2,3 ? #14

Closed priyamtejaswin closed 3 years ago

priyamtejaswin commented 3 years ago

Hi folks,

Thanks for releasing the code, and for making API easy to use. Changing the n_grams does not seem to change the scores -- I'm wondering if I'm doing something wrong.

I'm using the code provided in the README:

from moverscore_v2 import get_idf_dict, word_mover_score 
from collections import defaultdict

idf_dict_hyp = get_idf_dict(translations) # idf_dict_hyp = defaultdict(lambda: 1.)
idf_dict_ref = get_idf_dict(references) # idf_dict_ref = defaultdict(lambda: 1.)

scores = word_mover_score(references, translations, idf_dict_ref, idf_dict_hyp, \
                          stop_words=[], n_gram=1, remove_subwords=True)

I get the same scores for 1, 2, and 3 as n_gram values. My dataset is the Gigawords summarization Dev set:

andyweizhao commented 3 years ago

Thanks a lot for your interest. In the moverscore_v2.py, n-gram matching and p-means are ignored by design for speed and simplicity. The full version is in moverscore.py, but it costs longer time to run.