SEACrowd / seacrowd-datahub

A collaborative project to collect datasets in SEA languages, SEA regions, or SEA cultures.
Apache License 2.0
60 stars 56 forks source link

Create dataset loader for SentiBahasaRojak #497

Closed fajri91 closed 3 months ago

fajri91 commented 5 months ago

Dataloader name: senti_bahasa_rojak/senti_bahasa_rojak.py DataCatalogue: http://seacrowd.github.io/seacrowd-catalogue/card.html?senti_bahasa_rojak

Dataset senti_bahasa_rojak
Description This dataset contains reviews for products, movies, and stocks in the Bahasa Rojak dialect, a popular dialect in Malaysia that consists of English, Malay, and Chinese. Each review is manually annotated as positive (bullish for stocks) or negative (bearish for stocks). Reviews are generated through data augmentation using English and Malay sentiment analysis datasets.
Subsets movie, product, stock
Languages zlm, eng, cmn
Tasks Sentiment Analysis
License NA
Homepage https://data.depositar.io/dataset/brcc_and_sentibahasarojak/resource/8a558f64-98ff-4922-a751-0ce2ce8447bd
HF URL -
Paper URL https://aclanthology.org/2022.coling-1.389.pdf
khelli07 commented 5 months ago

self-assign

khelli07 commented 4 months ago

WIP today!