The creation of a quality summarization dataset is an expensive, time-consuming effort, requiring the production and evaluation of summaries by both trained humans and machines. The returns to such an effort would increase significantly if the dataset could be used in additional languages without repeating human annotations. To investigate how much we can trust machine translation of summarization datasets, we translate the English SummEval dataset to seven languages and compare performances across automatic evaluation measures. We explore equivalence testing as the appropriate statistical paradigm for evaluating correlations between human and automated scoring of summaries. We also consider the effect of translation on the relative performance between measures. We find some potential for dataset reuse in languages similar to the source and along particular dimensions of summary quality. Our code and data can be found at https://github.com/PrimerAI/primer-research/.

Translation (by gpt-3.5-turbo)

要約データセットの作成は、訓練された人間と機械による要約の作成と評価を必要とするため、費用と時間がかかる作業です。このような作業のリターンは、人間の注釈を繰り返すことなく、追加の言語でデータセットを使用できる場合に大幅に増加します。要約データセットの機械翻訳にどれだけ信頼できるかを調査するために、英語のSummEvalデータセットを7つの言語に翻訳し、自動評価尺度によるパフォーマンスを比較します。人間と自動化された要約のスコアリング間の相関を評価するために、適切な統計的パラダイムである等価性テストを探求します。また、翻訳が尺度間の相対的なパフォーマンスに与える影響も考慮します。ソースと似た言語や要約の品質の特定の側面において、データセットの再利用の可能性を見つけました。コードとデータはhttps://github.com/PrimerAI/primer-research/で入手できます。
Summary (by gpt-3.5-turbo)
要約データセットの作成は費用と時間がかかるが、機械翻訳を使用して既存のデータセットを他の言語に翻訳することで、追加の言語での使用が可能になる。この研究では、英語の要約データセットを7つの言語に翻訳し、自動評価尺度によるパフォーマンスを比較する。また、人間と自動化された要約のスコアリング間の相関を評価し、翻訳がパフォーマンスに与える影響も考慮する。さらに、データセットの再利用の可能性を見つけるために、特定の側面に焦点を当てる。

AkihikoWatanabe / paper_notes

Does Summary Evaluation Survive Translation to Other Languages?, Braun+, NAACL'22 #951

Translation (by gpt-3.5-turbo)

Summary (by gpt-3.5-turbo)