SEACrowd / seacrowd-datahub

A collaborative project to collect datasets in SEA languages, SEA regions, or SEA cultures.
Apache License 2.0
65 stars 57 forks source link

Create dataset loader for ASR-MALCSC: MALAY CONVERSATIONAL SPEECH CORPUS #442

Closed SamuelCahyawijaya closed 5 months ago

SamuelCahyawijaya commented 7 months ago

Dataloader name: asr_malcsc/asr_malcsc.py DataCatalogue: http://seacrowd.github.io/seacrowd-catalogue/card.html?asr_malcsc

Dataset asr_malcsc
Description This open-source dataset consists of 5 hours of transcribed Malay conversational speech on certain topics, where ten conversations between five pairs of speakers were contained.
Subsets -
Languages zlm
Tasks Text-To-Speech Synthesis, Automatic Speech Recognition
License Creative Commons Attribution Non Commercial No Derivatives 4.0 (cc-by-nc-nd-4.0)
Homepage https://magichub.com/datasets/malay-conversational-speech-corpus/
HF URL -
Paper URL -
zwenyu commented 7 months ago

self-assign