IndoNLP / nusa-crowd

A collaborative project to collect datasets in Indonesian languages.
Apache License 2.0
261 stars 61 forks source link

Create dataset loader for INDspeech_NEWSTRA_EthnicSR #276

Closed SamuelCahyawijaya closed 1 year ago

SamuelCahyawijaya commented 1 year ago

NusaCatalogue: https://indonlp.github.io/nusa-catalogue/card.html?indspeech_newstra_ethnicsr

Dataset indspeech_newstra_ethnicsr
Description INDspeech_NEWSTRA_EthnicSR is a collection of graphemically balanced and parallel speech corpora of four major Indonesian ethnic languages: Javanese, Sundanese, Balinese, and Bataks. It was developed in 2013 by the Nara Institute of Science and Technology (NAIST, Japan) [Sakti et al., 2013]. The data has been used to develop Indonesian ethnic speech recognition in supervised learning [Sakti et al., 2014] and semi-supervised learning [Novitasari et al., 2020] based on Machine Speech Chain framework [Tjandra et al., 2020].
License CC-BY-NC-SA 4.0
IvanHalimP commented 1 year ago

self-assign