While automatic summarization evaluation methods developed for English are routinely applied to other languages, this is the first attempt to systematically quantify their panlinguistic efficacy. We take a summarization corpus for eight different languages, and manually annotate generated summaries for focus (precision) and coverage (recall). Based on this, we evaluate 19 summarization evaluation metrics, and find that using multilingual BERT within BERTScore performs well across all languages, at a level above that for English.

Translation (by gpt-3.5-turbo)

英語向けに開発された自動要約評価手法は、他の言語にも定期的に適用されていますが、これはそれらのパンリンガルな効果を体系的に定量化する最初の試みです。私たちは8つの異なる言語の要約コーパスを取り、生成された要約を焦点（適合率）とカバレッジ（再現率）で手動で注釈付けします。これに基づいて、19の要約評価メトリックスを評価し、マルチリンガルBERTを使用したBERTScoreがすべての言語で優れたパフォーマンスを発揮し、英語よりも高いレベルであることがわかりました。
Summary (by gpt-3.5-turbo)
この研究では、異なる言語の要約コーパスを使用して、マルチリンガルBERTを用いたBERTScoreが他の要約評価メトリックスよりも優れたパフォーマンスを示すことが示されました。これは、英語以外の言語においても有効であることを示しています。

AkihikoWatanabe / paper_notes

Evaluating the Efficacy of Summarization Evaluation across Languages, Koto+ (w/ Tim先生), Findings of ACL'12 #979

Translation (by gpt-3.5-turbo)

Summary (by gpt-3.5-turbo)