SEACrowd / seacrowd-datahub

A collaborative project to collect datasets in SEA languages, SEA regions, or SEA cultures.
Apache License 2.0
68 stars 57 forks source link

Create dataset loader for NIE Corpus of Spoken Singapore English #510

Open SamuelCahyawijaya opened 8 months ago

SamuelCahyawijaya commented 8 months ago

Dataloader name: nie_spoken_sg_eng/nie_spoken_sg_eng.py DataCatalogue: http://seacrowd.github.io/seacrowd-catalogue/card.html?nie_spoken_sg_eng

Dataset nie_spoken_sg_eng
Description This corpus contains short recordings and transcripts of (a) Singaporean subjects being interviewed by a British male teacher, and (b) readings of the fable "The North Wind and the Sun". Each speaker is given a short bio describing their sex, age, ethnic group, occupation, education, and languages spoken.
Subsets interviews, the_north_wind_and_the_sun
Languages eng
Tasks Automatic Speech Recognition
License Unknown (unknown)
Homepage https://videoweb.nie.edu.sg/phonetic/niecsse/index.htm
HF URL -
Paper URL https://fass.ubd.edu.bn/staff/docs/DD/deterding-low-2005.pdf
mrqorib commented 7 months ago

self-assign

mrqorib commented 6 months ago

The data link has been inaccessible, giving a "503 Service Unavailable" error since last week. I emailed one of the authors (Low Ee Ling) on Monday but have not received any response. Does anyone know an alternative hosting of this dataset?