IndoNLP / nusa-crowd

A collaborative project to collect datasets in Indonesian languages.
Apache License 2.0
260 stars 60 forks source link

Create dataset loader for Sampiran #330

Closed SamuelCahyawijaya closed 1 year ago

SamuelCahyawijaya commented 1 year ago

NusaCatalogue: https://indonlp.github.io/nusa-catalogue/card.html?sampiran

Dataset sampiran
Description Sampiran is a dataset for pantun generation. It consists of 7.8K Indonesian pantun, collected from various sources (online). Pantun is a traditional Malay poem consisting of four lines: two lines of deliverance and two lines of message. This dataset filtered the gathered Pantun to follow the general rules of Pantun; four lines with ABAB rhyme and eight to twelve syllables per line.
License AGPL-3.0
SamuelCahyawijaya commented 1 year ago

self-assign

haryoa commented 1 year ago

self-assign