IndoNLP / nusa-crowd

A collaborative project to collect datasets in Indonesian languages.
Apache License 2.0
262 stars 62 forks source link

Create dataset loader for IndoPuisi #125

Closed SamuelCahyawijaya closed 2 years ago

SamuelCahyawijaya commented 2 years ago

https://indonlp.github.io/nusa-catalogue/card.html?indopuisi

ilhamfp commented 2 years ago

self-assign

ilhamfp commented 2 years ago

Hi @SamuelCahyawijaya 👋 I need your guidance on this one. What's the suitable nusantara schema/task for this dataset? Is it SELF_SUPERVISED_PRETRAINING? I have only used this dataset for training GPT here.

SamuelCahyawijaya commented 2 years ago

Hi @ilhamfp 👋. Thanks for contributing! Yeah, I agree, since the data is unlabelled, I think the most suitable one is to use it for SELF_SUPERVISED_PRETRAINING.