IndoNLP / nusa-crowd

A collaborative project to collect datasets in Indonesian languages.
Apache License 2.0
261 stars 61 forks source link

Create dataset loader for Parallel: Indonesian - Lampung Nyo #42

Closed SamuelCahyawijaya closed 1 year ago

SamuelCahyawijaya commented 2 years ago

https://indonlp.github.io/nusa-catalogue/card.html?parallel_id_nyo

haryoa commented 1 year ago

can I do this issue?

haryoa commented 1 year ago

self-assign

haryoa commented 1 year ago

The dataset provider gives us the data as PDF. Unfortunately, there's a mismatch between each language. From my observation, I think the mismatch begins at row 1729. Is it okay to publish the data partially ? (to row 1729)

I will create the partial dataloader then