SEACrowd / seacrowd-datahub

A collaborative project to collect datasets in SEA languages, SEA regions, or SEA cultures.
Apache License 2.0
55 stars 54 forks source link

Create dataset loader for MEN-Dataset #587

Closed SamuelCahyawijaya closed 1 month ago

SamuelCahyawijaya commented 3 months ago

Dataloader name: men/men.py DataCatalogue: http://seacrowd.github.io/seacrowd-catalogue/card.html?men

Dataset men
Description The Malaysian English News (MEN) dataset includes 200 Malaysian English news article with human annotated entities and relations (in total 6,061 entities and 3,268 relation instances). Malaysian English combines elements of standard English with Malay, Chinese, and Indian languages. Four human annotators were split into 2 groups, each group annotated 100 news articles and inter-annotator agreement was calculated between 2 or more annotators working on the same task (entity annotation; F1-score 0.82, relation annotation; F1-score 0.51).
Subsets -
Languages eng
Tasks Named Entity Recognition, Relation Extraction
License MIT (mit)
Homepage https://github.com/mohanraj-nlp/MEN-Dataset/tree/main
HF URL -
Paper URL https://arxiv.org/abs/2402.14521
patrickamadeus commented 3 months ago

self-assign