SEACrowd / seacrowd-datahub

A collaborative project to collect datasets in SEA languages, SEA regions, or SEA cultures.
Apache License 2.0
65 stars 57 forks source link

Create dataset loader for TMAD Malay Corpus #395

Closed SamuelCahyawijaya closed 7 months ago

SamuelCahyawijaya commented 8 months ago

Dataloader name: tmad_malay_corpus/tmad_malay_corpus.py DataCatalogue: http://seacrowd.github.io/seacrowd-catalogue/card.html?tmad_malay_corpus

Dataset tmad_malay_corpus
Description The Towards Malay Abbreviation Disambiguation (TMAD) Malay Corpus includes sentences from Malay news sites with abbreviations and their meanings. Only abbreviations with more than one possible meaning are included.
Subsets -
Languages zlm
Tasks Word Sense Disambiguation
License Unknown (unknown)
Homepage https://github.com/bhysss/TMAD-CUM/tree/master
HF URL -
Paper URL https://www.researchgate.net/publication/374540148_Towards_Malay_Abbreviation_Disambiguation_Corpus_and_Unsupervised_Model
ssun32 commented 8 months ago

self-assign