SEACrowd / seacrowd-datahub

A collaborative project to collect datasets in SEA languages, SEA regions, or SEA cultures.
Apache License 2.0
65 stars 57 forks source link

Create dataset loader for ASR-STIDUSC: A SCRIPTED THAI DAILY-USE SPEECH CORPUS #443

Closed SamuelCahyawijaya closed 7 months ago

SamuelCahyawijaya commented 7 months ago

Dataloader name: asr_stidusc/asr_stidusc.py DataCatalogue: http://seacrowd.github.io/seacrowd-catalogue/card.html?asr_stidusc

Dataset asr_stidusc
Description This open-source dataset consists of 4.56 hours of transcribed Thai scripted speech focusing on daily use sentences, where 5,431 utterances contributed by ten speakers were contained.
Subsets -
Languages tha
Tasks Text-To-Speech Synthesis, Automatic Speech Recognition
License Creative Commons Attribution Non Commercial No Derivatives 4.0 (cc-by-nc-nd-4.0)
Homepage https://magichub.com/datasets/thai-scripted-speech-corpus-daily-use-sentence/
HF URL -
Paper URL -
zwenyu commented 7 months ago

self-assign