Create dataset loader for NusaParagraph Rhetoric

IndoNLP / nusa-crowd

A collaborative project to collect datasets in Indonesian languages.

Apache License 2.0

260 stars 61 forks source link

Closed SamuelCahyawijaya closed 11 months ago

SamuelCahyawijaya commented 1 year ago

Dataset	nusa_paragraph_rhetoric
Description	NusaParagraph is a human-written paragraph datasets which covers 10 local languages in Indonesia. The dataset consists of around 50,000 paragraphs each with aroung 100 tokens resulting in a total of 6M tokens. The dataset is labelled with topic, emotion, and paragraph type.
License	CC-BY-NC-SA 4.0

SamuelCahyawijaya commented 11 months ago