Maluuba / nlg-eval

Evaluation code for various unsupervised automated metrics for Natural Language Generation.
http://arxiv.org/abs/1706.09799
Other
1.35k stars 224 forks source link

Infersent #71

Open Radhikadua123 opened 5 years ago

Radhikadua123 commented 5 years ago

Hi,

I am trying to obtain the semantic similarity between the generated and the ground truth sentence.

I used all these metrics to evaluate the generated sentences (validation dataset): BLEU 1 | 0.128031 BLEU 2 | 0.056153 BLEU 3 | 0.029837 BLEU 4 | 0.013649 METEOR | 0.305482 ROUGE_L | 0.148652 CIDEr | 0.069519 SkipThought cosine similarity | 0.765784 Embedding Average cosine similarity | 0.973187 Vector Extrema cosine similarity | 0.683888 Greedy Matching score | 0.94496

Some of these metrics indicates the sentences to be quite similar and some shows sentences to be different. Can you please suggest a metric to obtain the semantic similarity between sentences.

How about the Infersent and word mover's distance? I think you should consider adding these metrics for evaluation of text generation. This repository is helpful for evaluation of generated text.

Radhikadua123 commented 5 years ago

Can you please suggest which one of these metrics is widely used to get the semantic similarity between sentences? Thanks!