SEACrowd / seacrowd-datahub

A collaborative project to collect datasets in SEA languages, SEA regions, or SEA cultures.
Apache License 2.0
63 stars 57 forks source link

Create dataset loader for Shopee Reviews Tagalog #108

Closed SamuelCahyawijaya closed 9 months ago

SamuelCahyawijaya commented 10 months ago

Dataloader name: shopee_reviews_tagalog/shopee_reviews_tagalog.py DataCatalogue: http://seacrowd.github.io/seacrowd-catalogue/card.html?shopee_reviews_tagalog

Dataset shopee_reviews_tagalog
Description The Shopee reviews tl 15 dataset is constructed by randomly taking 2100 training samples and 450 samples for testing and validation for each review star from 1 to 5. In total, there are 10500 training samples and 2250 each in validation and testing samples.
Subsets -
Languages tgl, fil
Tasks Text Classification
License Mozilla Public License 2.0 (mpl-2.0)
Homepage https://huggingface.co/datasets/scaredmeow/shopee-reviews-tl-stars
HF URL https://huggingface.co/datasets/scaredmeow/shopee-reviews-tl-stars
Paper URL https://uijrt.com/articles/v4/i8/UIJRTV4I80009.pdf
ljvmiranda921 commented 10 months ago

self-assign