IndoNLP / nusa-crowd

A collaborative project to collect datasets in Indonesian languages.
Apache License 2.0
261 stars 61 forks source link

Create dataset loader for SU-ID TTS #281

Closed SamuelCahyawijaya closed 1 year ago

SamuelCahyawijaya commented 1 year ago

NusaCatalogue: https://indonlp.github.io/nusa-catalogue/card.html?su_id_tts

Dataset su_id_tts
Description This data set contains high-quality transcribed audio data for Sundanese. The data set consists of wave files, and a TSV file. The file line_index.tsv contains a filename and the transcription of audio in the file. Each filename is prepended with a speaker identification number. The data set has been manually quality checked, but there might still be errors. This dataset was collected by Google in collaboration with Universitas Pendidikan Indonesia.
License CC-BY-SA 4.0
IvanHalimP commented 1 year ago

self-assign