SamuelCahyawijaya commented 2 years ago

NusaCatalogue: https://indonlp.github.io/nusa-catalogue/card.html?id_hsd_nofaaulia

Dataset	id_hsd_nofaaulia
Description	There have been many studies on detecting hate speech in short documents like Twitter data. But to our knowledge, research on long documents is rare, we suppose that the difficulty is increasing due to the possibility of the message of the text may be hidden. In this research, we explore in detecting hate speech on Indonesian long documents using machine learning approach. We build a new Indonesian hate speech dataset from Facebook.
License	Unknown

Dataset

id_hsd_nofaaulia

Description

There have been many studies on detecting hate speech in short documents like Twitter data. But to our knowledge, research on long documents is rare, we suppose that the difficulty is increasing due to the possibility of the message of the text may be hidden. In this research, we explore in detecting hate speech on Indonesian long documents using machine learning approach. We build a new Indonesian hate speech dataset from Facebook.

License

Unknown

SamuelCahyawijaya commented 2 years ago

The dataset split is defined on the code here: https://github.com/nofaulia/hate-speech-detection/blob/main/experiment.py

wenliangdai commented 2 years ago

self-assign

khelli07 commented 2 years ago

self-assign

IvanHalimP commented 2 years ago

IndoNLP / nusa-crowd

Create dataset loader for ID-HSD-Nofaaulia #225

self-assign

self-assign

self-assign