SEACrowd / seacrowd-datahub

A collaborative project to collect datasets in SEA languages, SEA regions, or SEA cultures.
Apache License 2.0
64 stars 57 forks source link

Create dataset loader for TalkBankDB CHILDES #684

Open SamuelCahyawijaya opened 4 months ago

SamuelCahyawijaya commented 4 months ago

Dataloader name: talkbankdb_childes/talkbankdb_childes.py DataCatalogue: http://seacrowd.github.io/seacrowd-catalogue/card.html?talkbankdb_childes

Dataset talkbankdb_childes
Description The Child Language Data Exchange System (CHILDES) (https://childes.talkbank.org) is the child language component of the TalkBank system (https://www.talkbank.org). Data can be accessed through the TalkBankDB portal or using a Python API (see link below) or the package described here: https://link.springer.com/article/10.3758/s13428-018-1176-7. TalkBank is an interdisciplinary project designed to create an openly available database for recording and transcribing spoken language interactions. It comprises a series of topic-specific databases for particular research areas. These areas include classroom discourse, aphasia, conversation analysis, Supreme Court, bilingualism, second language learning, dementia, child languages and five other more specific topic areas.
Subsets Indonesian, Javanese, Manado Malay, Tagalog, Thai, Yau
Languages ind, jav, xmm, tgl, tha, jau
Tasks Automatic Speech Recognition
License BSD 3-clause Clear license (bsd-3-clause-clear)
Homepage https://github.com/TalkBank/TBDBpy
HF URL -
Paper URL https://direct.mit.edu/coli/article/26/4/657/1687