Create dataset loader for INDspeech_NEWSTRA_EthnicSR

NusaCatalogue: https://indonlp.github.io/nusa-catalogue/card.html?indspeech_newstra_ethnicsr

Dataset	indspeech_newstra_ethnicsr
Description	INDspeech_NEWSTRA_EthnicSR is a collection of graphemically balanced and parallel speech corpora of four major Indonesian ethnic languages: Javanese, Sundanese, Balinese, and Bataks. It was developed in 2013 by the Nara Institute of Science and Technology (NAIST, Japan) [Sakti et al., 2013]. The data has been used to develop Indonesian ethnic speech recognition in supervised learning [Sakti et al., 2014] and semi-supervised learning [Novitasari et al., 2020] based on Machine Speech Chain framework [Tjandra et al., 2020].
License	CC-BY-NC-SA 4.0

Dataset

indspeech_newstra_ethnicsr

Description

INDspeech_NEWSTRA_EthnicSR is a collection of graphemically balanced and parallel speech corpora of four major Indonesian ethnic languages: Javanese, Sundanese, Balinese, and Bataks. It was developed in 2013 by the Nara Institute of Science and Technology (NAIST, Japan) [Sakti et al., 2013]. The data has been used to develop Indonesian ethnic speech recognition in supervised learning [Sakti et al., 2014] and semi-supervised learning [Novitasari et al., 2020] based on Machine Speech Chain framework [Tjandra et al., 2020].

License

CC-BY-NC-SA 4.0

IndoNLP / nusa-crowd

Create dataset loader for INDspeech_NEWSTRA_EthnicSR #276

self-assign