SEACrowd / seacrowd-datahub

A collaborative project to collect datasets in SEA languages, SEA regions, or SEA cultures.
Apache License 2.0
57 stars 54 forks source link

Create dataset loader for MassiveSumm #582

Open SamuelCahyawijaya opened 3 months ago

SamuelCahyawijaya commented 3 months ago

Dataloader name: massivesum/massivesum.py DataCatalogue: http://seacrowd.github.io/seacrowd-catalogue/card.html?massivesum

Dataset massivesum
Description A (massive) multilingual dataset for summarization consisting of 92 diverse languages, across 35 writing scripts.
Subsets fil, ind, khm, lao, mya, tha, vie
Languages fil, ind, khm, lao, mya, tha, vie
Tasks Language Modeling, Summarization
License Unknown (unknown)
Homepage https://github.com/danielvarab/massive-summ
HF URL -
Paper URL https://aclanthology.org/2021.emnlp-main.797/

Dataloader name: multilingual_nli_26lang/multilingual_nli_26lang.py DataCatalogue: http://seacrowd.github.io/seacrowd-catalogue/card.html?multilingual_nli_26lang

richardy-lobo-sapan commented 2 months ago

self-assign