SamuelCahyawijaya commented 11 months ago

Dataset	bloom_speech
Description	This version of the Bloom Library data is developed specifically for the automatic speech recognition and speech-to-text tasks. It includes data from 56 languages across 18 language families. 8 languages are spoken in Southeast Asia
Subsets	bjn, bzi, ceb, ind, jra, kqr, mya, tgl
Languages	bjn, bzi, ceb, ind, jra, kqr, mya, tgl
Tasks	Speech-to-Text Translation, Text-To-Speech Synthesis
License	Other (other)
Homepage	https://huggingface.co/datasets/sil-ai/bloom-speech
HF URL	https://huggingface.co/datasets/sil-ai/bloom-speech
Paper URL	https://aclanthology.org/2022.emnlp-main.590

sabilmakbar commented 11 months ago

self-assign

github-actions[bot] commented 11 months ago

Hi, may I know if you are still working on this issue? Please let @holylovenia @SamuelCahyawijaya @sabilmakbar know if you need any help.

sabilmakbar commented 10 months ago

Will draft a PR later today.

SEACrowd / seacrowd-datahub