SamuelCahyawijaya commented 8 months ago

Dataloader name: pangloss_collection/pangloss_collection.py DataCatalogue: http://seacrowd.github.io/seacrowd-catalogue/card.html?pangloss_collection

Dataset	pangloss_collection
Description	The Pangloss Collection is an open archive of audio recordings of underdocumented languages across the world and their dialects, including languages from Cambodia, Laos, Myanmar and Vietnam. About half of all recordings are transcribed, annotated and translated. Many recordings are readings of vocabulary lists or are narratives about the speakers' lives.
Subsets	khm, cog, pcb, sxm, tpu, jra, thm, bru, kjg, pkt, nev, oog, hal, tnu, mya, kac, kkh, lhu, aem, crw, cje, kjm, mtq, zng, rgs, tyr, twh, tpo, viekhm
Languages	khm, cog, pcb, sxm, tpu, jra, thm, bru, kjg, pkt, nev, oog, hal, tnu, mya, kac, kkh, lhu, aem, crw, cje, kjm, mtq, zng, rgs, tyr, twh, tpo, vie
Tasks	Automatic Speech Recognition
License	Creative Commons Attribution Non Commercial Share Alike 2.0 (cc-by-nc-sa-2.0)
Homepage	https://github.com/CNRS-LACITO/Pangloss_website/
HF URL	-
Paper URL	https://hal.science/hal-00005544

mrqorib commented 7 months ago

self-assign

mrqorib commented 6 months ago

I'm sorry but I won't have enough time to work on this before the deadline. I'm unassigning myself from this issue for now. I can work on it after the deadline if no one picks this up by then and it's still deemed important to be included.

SEACrowd / seacrowd-datahub

Create dataset loader for Pangloss Collection #511

self-assign