SEACrowd / seacrowd-datahub

A collaborative project to collect datasets in SEA languages, SEA regions, or SEA cultures.
Apache License 2.0
65 stars 57 forks source link

Create dataset loader for indonglish-dataset #475

Closed SamuelCahyawijaya closed 7 months ago

SamuelCahyawijaya commented 7 months ago

Dataloader name: indonglish/indonglish.py DataCatalogue: http://seacrowd.github.io/seacrowd-catalogue/card.html?indonglish

Dataset indonglish
Description Indonglish-dataset was constructed based on keywords derived from the sociolinguistic phenomenon observed among teenagers in South Jakarta. The dataset was designed to tackle the semantic task of sentiment analysis, incorporating three distinct label categories: positive, negative, and neutral. The annotation of the dataset was carried out by a panel of five annotators, each possessing expertise language and data science
Subsets -
Languages ind
Tasks Sentiment Analysis
License Unknown (unknown)
Homepage https://github.com/laksmitawidya/indonglish-dataset
HF URL https://huggingface.co/TalTechNLP/voxlingua107-epaca-tdnn
Paper URL https://thesai.org/Publications/ViewPaper?Volume=14&Issue=10&Code=IJACSA&SerialNo=53
zwenyu commented 7 months ago

self-assign

zwenyu commented 7 months ago

@SamuelCahyawijaya I can't find how the dataset in https://huggingface.co/TalTechNLP/voxlingua107-epaca-tdnn is used in this work. I'll implement for the datasets in https://github.com/laksmitawidya/indonglish-dataset?