SEACrowd / seacrowd-datahub

A collaborative project to collect datasets in SEA languages, SEA regions, or SEA cultures.
Apache License 2.0
68 stars 57 forks source link

Create dataset loader for ASR-INDOCSC: AN INDONESIAN CONVERSATIONAL SPEECH CORPUS #438

Closed SamuelCahyawijaya closed 7 months ago

SamuelCahyawijaya commented 9 months ago

Dataloader name: asr_indocsc/asr_indocsc.py DataCatalogue: http://seacrowd.github.io/seacrowd-catalogue/card.html?asr_indocsc

Dataset asr_indocsc
Description This open-source dataset consists of 4.54 hours of transcribed Indonesian conversational speech on certain topics, where seven conversations between two pairs of speakers were contained.
Subsets -
Languages ind
Tasks Text-To-Speech Synthesis, Automatic Speech Recognition
License Creative Commons Attribution Non Commercial No Derivatives 4.0 (cc-by-nc-nd-4.0)
Homepage https://magichub.com/datasets/indonesian-conversational-speech-corpus/
HF URL -
Paper URL -
zwenyu commented 8 months ago

self-assign