SEACrowd / seacrowd-datahub

A collaborative project to collect datasets in SEA languages, SEA regions, or SEA cultures.
Apache License 2.0
68 stars 57 forks source link

Create dataset loader for Kheng.info Speech #366

Closed SamuelCahyawijaya closed 9 months ago

SamuelCahyawijaya commented 10 months ago

Dataloader name: kheng_info_speech/kheng_info_speech.py DataCatalogue: http://seacrowd.github.io/seacrowd-catalogue/card.html?kheng_info_speech

Dataset kheng_info_speech
Description The Kheng.info Speech dataset was derived from recordings of Khmer words on the Khmer dictionary website kheng.info. The recordings were recorded by a native Khmer speaker. The recordings are short, generally ranging between 1 to 2 seconds only.
Subsets -
Languages khm
Tasks Automatic Speech Recognition
License Unknown (unknown)
Homepage https://huggingface.co/datasets/seanghay/khmer_kheng_info_speech
HF URL https://huggingface.co/datasets/seanghay/khmer_kheng_info_speech
Paper URL -
jensan-1 commented 10 months ago

self-assign