AI4Bharat / indicnlp_catalog

A collaborative catalog of NLP resources for Indic languages
https://ai4bharat.github.io/indicnlp_catalog
563 stars 80 forks source link

XL-SUM #168

Closed anoopkunchukuttan closed 2 years ago

anoopkunchukuttan commented 2 years ago
anoopkunchukuttan commented 11 months ago

XL-Sum: A Large-Scale Multilingual Abstractive Summarization containing 150k examples across 10 Indic languages.

Part of the larger collection containing 1 million examples mined from BBC. The short summaries are professionally written by BBC staff. This dataset is widely used for multilingual summarization studies.

Manual evaluation on a subset show the summaries fulfil many attributes of a good summary.

Paper link: https://arxiv.org/abs/2106.13822 (presented at ACL 2021) Dataset: https://github.com/csebuetnlp/xl-sum

Work from BUET (Bangladesh), Univ of Rochester, Monash Univ and Swinburne Univ of Technology