Closed anoopkunchukuttan closed 2 years ago
XL-Sum: A Large-Scale Multilingual Abstractive Summarization containing 150k examples across 10 Indic languages.
Part of the larger collection containing 1 million examples mined from BBC. The short summaries are professionally written by BBC staff. This dataset is widely used for multilingual summarization studies.
Manual evaluation on a subset show the summaries fulfil many attributes of a good summary.
Paper link: https://arxiv.org/abs/2106.13822 (presented at ACL 2021) Dataset: https://github.com/csebuetnlp/xl-sum
Work from BUET (Bangladesh), Univ of Rochester, Monash Univ and Swinburne Univ of Technology