Create dataset loader for Korpus Nusantara

NusaCatalogue: https://indonlp.github.io/nusa-catalogue/card.html?korpus_nusantara

Dataset	korpus_nusantara
Description	The dataset is a combination of multiple machine translation works from the author, Herry Sujaini, covering Indonesian to 25 local dialects in Indonesia. Since not all dialects have ISO639-3 standard coding, as agreed with Pak Herry , we decided to group the dataset into the closest language family, i.e.: Javanese, Dayak, Buginese, Sundanese, Madurese, Banjar, Batak Toba, Khek, Malay, Minangkabau, and Tiociu.
License	Unknown

Dataset

korpus_nusantara

Description

The dataset is a combination of multiple machine translation works from the author, Herry Sujaini, covering Indonesian to 25 local dialects in Indonesia. Since not all dialects have ISO639-3 standard coding, as agreed with Pak Herry , we decided to group the dataset into the closest language family, i.e.: Javanese, Dayak, Buginese, Sundanese, Madurese, Banjar, Batak Toba, Khek, Malay, Minangkabau, and Tiociu.

License

Unknown

IndoNLP / nusa-crowd

Create dataset loader for Korpus Nusantara #224

self-assign