SEACrowd / seacrowd-datahub

A collaborative project to collect datasets in SEA languages, SEA regions, or SEA cultures.
Apache License 2.0
66 stars 57 forks source link

Create dataset loader for SPAMID-PAIR #277

Closed SamuelCahyawijaya closed 9 months ago

SamuelCahyawijaya commented 10 months ago

Dataloader name: spamid-pair/spamid-pair.py DataCatalogue: http://seacrowd.github.io/seacrowd-catalogue/card.html?spamid-pair

Dataset spamid-pair
Description SPAMID-PAIR is data post-comment pairs collected from 13 selected Indonesian public figures (artists) / public accounts with more than 15 million followers and categorized as famous artists. It was collected from Instagram using an online tool and Selenium. Two persons labeled all pair data as an expert in a total of 72874 data. The data contains Unicode text (UTF-8) and emojis scrapped in posts and comments without account profile information.
Subsets -
Languages ind
Tasks Text Classification
License Creative Commons Attribution 4.0 (cc-by-4.0)
Homepage https://data.mendeley.com/datasets/fj5pbdf95t/1
HF URL -
Paper URL https://dx.doi.org/10.14569/IJACSA.2022.0131110
Alex-HaochenLi commented 10 months ago

self-assign