Open linear[bot] opened 1 month ago
Dataset Name: Annotated Al Jazeera Dialectal Speech Corpus Link: https://arbml.github.io/masader/card?id=15 Volume: 57 hours Dialect: Mixed Notes: Missing/Inaccessible
Dataset Name: Multi-Genre Broadcast (MGB-2) Link: https://arabicspeech.org/resources/mgb2 Volume: 1200 hours Dialect: Mixed Notes:
Dataset Name: Multi-Genre Broadcast (MGB-3) Link: https://arabicspeech.org/resources/mgb3 Volume: 15.8 hours Dialect: Egyptian Notes:
Dataset Name: Multi-Genre Broadcast (MGB-5) Link: https://arabicspeech.org/resources/mgb5 Volume: 14 hours Dialect: Moroccan Notes:
Dataset Name: QASR Link: https://arabicspeech.org/resources/qasr Volume: 2041 hours Dialect: Mixed Notes:
Dataset Name: ESCWA-CS Link: https://arabicspeech.org/resources/escwacs Volume: 2.8 hours Dialect: Mixed Notes:
Dataset Name: Common Voice Dataset Link: https://huggingface.co/datasets/mozilla-foundation/common_voice_17_0/ Volume: 155.8 hours Dialect: Notes:
Dataset Name: MediaSpeech Link: https://huggingface.co/datasets/arbml/MediaSpeech_ar Volume: 10 hours Dialect: Notes:
Dataset Name: WAW Corpus Link: https://alt.qcri.org/resources/wawcorpus/ Volume: 0.5 hours Dialect: Notes: Audio files missing
Dataset Name: Arab-Andalusian music corpus Link: https://zenodo.org/records/1291776#.YqTFeHZBxD9 Volume: 125 hours Dialect: Notes:
Dataset Name: MASC: Massive Arabic Speech Corpus Link: https://huggingface.co/datasets/pain/MASC Volume: 1000 hours Dialect: Mixed Notes:
Dataset Name: QAC: Qatari Arabic Corpus Link: https://web.archive.org/web/20150918002143/http://sprosig.isle.illinois.edu/corpora/1 Volume: 18.5 hours Dialect: Qatari Notes: Dataset Missing
Dataset Name: ArabCeleb Link: https://github.com/CeLuigi/ArabCeleb Volume: Dialect: Notes:
Dataset Name: Quran Speech: Imam + Users Link: https://github.com/tarekeldeeb/DeepSpeech-Quran/tree/master/data/quran Volume: Dialect: Notes:
Dataset Name: SADA Link: https://www.kaggle.com/datasets/sdaiancai/sada2022 Volume: Dialect: Notes:
Dataset Name: 400K Egyptian Arabic Lines Link: https://www.kaggle.com/datasets/fadisarwat/egyptian-arabic-lines Volume: Dialect: Notes:
Dataset Name: ASR-EGARBCSC Link: https://magichub.com/datasets/egyptian-arabic-conversational-speech-corpus/ Volume: Dialect: Notes:
Dataset Name: SciSoundArabia Link: https://www.kaggle.com/datasets/ghalebaa/scisoundarabia Volume: Dialect: Notes:
Dataset Name: FLUERS Link: https://huggingface.co/datasets/google/fleurs/viewer/ar_eg Volume: Dialect: Egyptian Notes:
WHI-15 Curate a list of good Arabic Datasets (potential candidates)