IndoNLP / nusa-crowd

A collaborative project to collect datasets in Indonesian languages.
Apache License 2.0
261 stars 61 forks source link

Create dataset loader for INDspeech_NEWS_LVCSR #280

Closed SamuelCahyawijaya closed 1 year ago

SamuelCahyawijaya commented 1 year ago

NusaCatalogue: https://indonlp.github.io/nusa-catalogue/card.html?indspeech_news_lvcsr

Dataset indspeech_news_lvcsr
Description INDspeech_NEWS_LVCSR is the first Indonesian speech dataset for large vocabulary continuous speech recognition (LVCSR). R&D Division of PT Telekomunikasi Indonesia developed the data in 2005-2006, in collaboration with Advanced Telecommunication Research Institute International (ATR) Japan, as the continuation of the Asia-Pacific Telecommunity (APT) project [Sakti et al., 2004]. It has also been successfully used for developing Indonesian LVCSR in the Asian speech translation advanced research (A-STAR) project [Sakti et al., 2013]
License CC-BY-NC-SA 4.0
IvanHalimP commented 1 year ago

self-assign