SEACrowd / seacrowd-datahub

A collaborative project to collect datasets in SEA languages, SEA regions, or SEA cultures.
Apache License 2.0
65 stars 57 forks source link

Create dataset loader for ASR-SINDODUSC: A SCRIPTED INDONESIAN DAILY-USE SPEECH CORPUS #439

Closed SamuelCahyawijaya closed 7 months ago

SamuelCahyawijaya commented 7 months ago

Dataloader name: asr_sindodusc/asr_sindodusc.py DataCatalogue: http://seacrowd.github.io/seacrowd-catalogue/card.html?asr_sindodusc

Dataset asr_sindodusc
Description This open-source dataset consists of 3.5 hours of transcribed Indonesian scripted speech focusing on daily use sentences, where 3,296 utterances contributed by ten speakers were contained.
Subsets -
Languages ind
Tasks Text-To-Speech Synthesis, Automatic Speech Recognition
License Creative Commons Attribution Non Commercial No Derivatives 4.0 (cc-by-nc-nd-4.0)
Homepage https://magichub.com/datasets/indonesian-scripted-speech-corpus-daily-use-sentence/
HF URL -
Paper URL -
zwenyu commented 7 months ago

self-assign