AkihikoWatanabe commented 1 year ago

A robust evaluation metric has a profound impact on the development of text generation systems. A desirable metric compares system output against references based on their semantics rather than surface forms. In this paper we investigate strategies to encode system and reference texts to devise a metric that shows a high correlation with human judgment of text quality. We validate our new metric, namely MoverScore, on a number of text generation tasks including summarization, machine translation, image captioning, and data-to-text generation, where the outputs are produced by a variety of neural and non-neural systems. Our findings suggest that metrics combining contextualized representations with a distance measure perform the best. Such metrics also demonstrate strong generalization capability across tasks. For ease-of-use we make our metrics available as web service.

Translation (by gpt-3.5-turbo)

堅牢な評価尺度は、テキスト生成システムの開発に深い影響を与えます。望ましい尺度は、表面的な形式ではなく、システムの出力と参照の意味に基づいて比較するものです。本論文では、システムと参照テキストをエンコードする戦略を調査し、テキスト品質の人間の判断と高い相関を示す尺度を考案します。私たちは、要約、機械翻訳、画像キャプション、データからテキストへの生成など、さまざまなニューラルおよび非ニューラルシステムによって出力されるテキスト生成タスクで、新しい尺度であるMoverScoreを検証します。私たちの結果は、文脈化表現と距離尺度を組み合わせた尺度が最も優れていることを示しています。このような尺度は、タスク間で強力な汎化能力も示します。利便性のために、私たちは尺度をウェブサービスとして提供しています。
Summary (by gpt-3.5-turbo)
本研究では、テキスト生成システムの評価尺度について調査し、システムの出力と参照テキストの意味に基づいて比較する尺度を提案します。この尺度は、要約、機械翻訳、画像キャプション、データからテキストへの生成などのタスクで有効であり、文脈化表現と距離尺度を組み合わせたものが最も優れています。また、提案した尺度は強力な汎化能力を持っており、ウェブサービスとして提供されています。

AkihikoWatanabe commented 9 months ago

Word Mover Distance (WMD)の解説: https://yubessy.hatenablog.com/entry/2017/01/10/122737

AkihikoWatanabe / paper_notes

MoverScore: Text Generation Evaluating with Contextualized Embeddings and Earth Mover Distance, Zhao+, EMNLP-IJCNLP'19 #946

Translation (by gpt-3.5-turbo)

Summary (by gpt-3.5-turbo)