IndoNLP / nusa-crowd

A collaborative project to collect datasets in Indonesian languages.
Apache License 2.0
261 stars 62 forks source link

Create dataset loader for KaWAT #204

Closed SamuelCahyawijaya closed 1 year ago

SamuelCahyawijaya commented 2 years ago

NusaCatalogue: https://indonlp.github.io/nusa-catalogue/card.html?kawat

Dataset kawat
Description We introduced KaWAT (Kata Word Analogy Task), a new word analogy task dataset for Indonesian. We evaluated on it several existing pretrained Indonesian word embeddings and embeddings trained on Indonesian online news corpus. We also tested them on two downstream tasks and found that pretrained word embeddings helped either by reducing the training epochs or yielding significant performance gains.
License Apache 2.0
rizkyramadhana26 commented 2 years ago

self-assign

muhsatrio commented 2 years ago

Hi @rizkyramadhana26, are you still working on this task? thank you!

muhsatrio commented 1 year ago

self-assign