SEACrowd / seacrowd-datahub

A collaborative project to collect datasets in SEA languages, SEA regions, or SEA cultures.
Apache License 2.0
55 stars 54 forks source link

Create dataset loader for Thai Elderly Speech dataset #590

Closed SamuelCahyawijaya closed 1 month ago

SamuelCahyawijaya commented 3 months ago

Dataloader name: thai_elderly_speech/thai_elderly_speech.py DataCatalogue: http://seacrowd.github.io/seacrowd-catalogue/card.html?thai_elderly_speech

Dataset thai_elderly_speech
Description The Thai Elderly Speech dataset by Data Wow and VISAI Version 1 dataset aims at advancing Automatic Speech Recognition (ASR) technology specifically for the elderly population. Researchers can use this dataset to advance ASR technology for healthcare and smart home applications. The dataset consists of 19,200 audio files, totaling 17 hours and 11 minutes of recorded speech. The files are divided into 2 categories: Healthcare (relating to medical issues and services in 30 medical categories) and Smart Home (relating to smart home devices in 7 household contexts). The dataset contains 5,156 unique sentences spoken by 32 seniors (10 males and 22 females), aged 57-60 years old (average age of 63 years).
Subsets Healthcare, Smart Home
Languages tha
Tasks Automatic Speech Recognition
License Creative Commons Attribution Share Alike 4.0 (cc-by-sa-4.0)
Homepage https://github.com/VISAI-DATAWOW/Thai-Elderly-Speech-dataset/releases/tag/v1.0.0, https://www.wang.in.th/dataset/64a228ab41c99c04544f2556
HF URL -
Paper URL -
akhdanfadh commented 3 months ago

self-assign