SEACrowd / seacrowd-datahub

A collaborative project to collect datasets in SEA languages, SEA regions, or SEA cultures.
Apache License 2.0
58 stars 54 forks source link

Create dataset loader for XED #533

Closed SamuelCahyawijaya closed 2 months ago

SamuelCahyawijaya commented 4 months ago

Dataloader name: xed/xed.py DataCatalogue: http://seacrowd.github.io/seacrowd-catalogue/card.html?xed

Dataset xed
Description XED, a multilingual fine-grained emotion dataset. The dataset consists of human-annotated Finnish (25k) and English sentences (30k), as well as projected annotations for 30 additional languages, providing new resources for many low-resource languages.
Subsets XED-VI
Languages vie
Tasks Sentiment Analysis
License Unknown (unknown)
Homepage https://github.com/Helsinki-NLP/XED/blob/master/subtitle-retrieval/students/pairs-vi.txt
HF URL -
Paper URL https://aclanthology.org/2020.coling-main.575.pdf
khelli07 commented 4 months ago

self-assign

khelli07 commented 4 months ago

I think this one has Indonesian as well https://github.com/Helsinki-NLP/XED/blob/master/subtitle-retrieval/students/pairs-id.txt Should we include or not?

muhammadravi251001 commented 2 months ago

self-assign