IndoNLP / nusa-crowd

A collaborative project to collect datasets in Indonesian languages.
Apache License 2.0
262 stars 62 forks source link

Create dataset loader for ID-HSD-Nofaaulia #225

Closed SamuelCahyawijaya closed 2 years ago

SamuelCahyawijaya commented 2 years ago

NusaCatalogue: https://indonlp.github.io/nusa-catalogue/card.html?id_hsd_nofaaulia

Dataset id_hsd_nofaaulia
Description There have been many studies on detecting hate speech in short documents like Twitter data. But to our knowledge, research on long documents is rare, we suppose that the difficulty is increasing due to the possibility of the message of the text may be hidden. In this research, we explore in detecting hate speech on Indonesian long documents using machine learning approach. We build a new Indonesian hate speech dataset from Facebook.
License Unknown
SamuelCahyawijaya commented 2 years ago

The dataset split is defined on the code here: https://github.com/nofaulia/hate-speech-detection/blob/main/experiment.py

wenliangdai commented 2 years ago

self-assign

khelli07 commented 2 years ago

self-assign

IvanHalimP commented 2 years ago

self-assign