IndoNLP / nusa-crowd

A collaborative project to collect datasets in Indonesian languages.
Apache License 2.0
261 stars 61 forks source link

Create dataset loader for INDspeech_TELDIALOG_LVCSR #275

Closed SamuelCahyawijaya closed 1 year ago

SamuelCahyawijaya commented 1 year ago

NusaCatalogue: https://indonlp.github.io/nusa-catalogue/card.html?indspeech_teldialog_lvcsr

Dataset indspeech_teldialog_lvcsr
Description INDspeech_TELDIALOG_LVCSR is one of the first Indonesian speech datasets for large vocabulary continuous speech recognition (LVCSR) based on telephon application. R&D Division of PT Telekomunikasi Indonesia developed the data in 2005-2006, in collaboration with Advanced Telecommunication Research Institute International (ATR) Japan, as the continuation of the Asia-Pacific Telecommunity (APT) project [Sakti et al., 2004]. It has also been successfully used for developing Indonesian LVCSR in the Asian speech translation advanced research (A-STAR) project [Sakti et al., 2013].
License CC-BY-NC-SA 4.0
jensan-1 commented 1 year ago

self-assign

ziweiji commented 1 year ago

self-assign