SEACrowd / seacrowd-datahub

A collaborative project to collect datasets in SEA languages, SEA regions, or SEA cultures.
Apache License 2.0
65 stars 57 forks source link

Create dataset loader for CrossSum #396

Closed SamuelCahyawijaya closed 7 months ago

SamuelCahyawijaya commented 8 months ago

Dataloader name: crosssum/crosssum.py DataCatalogue: http://seacrowd.github.io/seacrowd-catalogue/card.html?crosssum

Dataset crosssum
Description This is a large-scale cross-lingual summarization dataset containing article-summary samples in 1,500+ language pairs, which includes pairs with the Burmese, Indonesian and Vietnamese languages. Articles in the first language are assigned summaries in the second language.
Subsets id-my, id-vi, my-id, my-vi, vi-id, vi-my
Languages ind, vie, mya
Tasks Abstractive Summarization
License Creative Commons Attribution Non Commercial Share Alike 4.0 (cc-by-nc-sa-4.0)
Homepage https://drive.google.com/file/d/11yCJxK5necOyZBxcJ6jncdCFgNxrsl4m/view
HF URL https://huggingface.co/datasets/csebuetnlp/CrossSum
Paper URL https://aclanthology.org/2023.acl-long.143.pdf
elyanah-aco commented 8 months ago

self-assign