IndoNLP / nusa-crowd

A collaborative project to collect datasets in Indonesian languages.
Apache License 2.0
260 stars 61 forks source link

Closes #42: Create dataset loader for parallel id lampung nyo dataset #339

Closed haryoa closed 1 year ago

haryoa commented 1 year ago

Closes #42

The data is incomplete (only 1729 lines). The original one is in PDF format, so I extracted it with a PDF extractor. After that, I cut the unaligned one manually. I hosted the data in my repository.

Checkbox