SEACrowd / seacrowd-datahub

A collaborative project to collect datasets in SEA languages, SEA regions, or SEA cultures.
Apache License 2.0
65 stars 57 forks source link

Create dataset loader for Alorese Collection #448

Closed SamuelCahyawijaya closed 5 months ago

SamuelCahyawijaya commented 7 months ago

Dataloader name: alorese/alorese.py DataCatalogue: http://seacrowd.github.io/seacrowd-catalogue/card.html?alorese

Dataset alorese
Description Alorese Collection or Alorese Corpus is a collection of language data in a couple of Alorese variation (Alor and Pantar Alorese). The collection is available in video, audio, and text formats with genres ranging from Experiment or task, Stimuli, Discourse, and Written materials.
Subsets -
Languages aol, ind
Tasks Language Modeling, Automatic Speech Recognition, Machine Translation
License Unknown (unknown)
Homepage https://hdl.handle.net/1839/e10d7de5-0a6d-4926-967b-0a8cc6d21fb1
HF URL -
Paper URL https://scholarlypublications.universiteitleiden.nl/handle/1887/70891
patrickamadeus commented 7 months ago

self-assign