SEACrowd / seacrowd-datahub

A collaborative project to collect datasets in SEA languages, SEA regions, or SEA cultures.
Apache License 2.0
55 stars 54 forks source link

Closes #480 | Add Dataloader Aya Collection - Translated #668

Closed muhammadravi251001 closed 1 month ago

muhammadravi251001 commented 1 month ago

Title: Add Dataloader Aya Collection - Translated

First line PR Message: Closes https://github.com/SEACrowd/seacrowd-datahub/issues/480

Test: You can check this dataloader implementation by copy-pasting this command (to make a faster evaluation). Note that, the evaluation will take a long time due to the huge size of the dataset (since some of the dataset size is >2 GB):

python -m tests.test_seacrowd seacrowd/sea_datasets/aya_collection_translated/aya_collection_translated.py --subset_id aya_collection_translated_ceb
python -m tests.test_seacrowd seacrowd/sea_datasets/aya_collection_translated/aya_collection_translated.py --subset_id aya_collection_translated_tha
python -m tests.test_seacrowd seacrowd/sea_datasets/aya_collection_translated/aya_collection_translated.py --subset_id aya_collection_translated_mya
python -m tests.test_seacrowd seacrowd/sea_datasets/aya_collection_translated/aya_collection_translated.py --subset_id aya_collection_translated_zsm
python -m tests.test_seacrowd seacrowd/sea_datasets/aya_collection_translated/aya_collection_translated.py --subset_id aya_collection_translated_jav
python -m tests.test_seacrowd seacrowd/sea_datasets/aya_collection_translated/aya_collection_translated.py --subset_id aya_collection_translated_ind
python -m tests.test_seacrowd seacrowd/sea_datasets/aya_collection_translated/aya_collection_translated.py --subset_id aya_collection_translated_vie
python -m tests.test_seacrowd seacrowd/sea_datasets/aya_collection_translated/aya_collection_translated.py --subset_id aya_collection_translated_sun
python -m tests.test_seacrowd seacrowd/sea_datasets/aya_collection_translated/aya_collection_translated.py --subset_id aya_collection_translated_ace
python -m tests.test_seacrowd seacrowd/sea_datasets/aya_collection_translated/aya_collection_translated.py --subset_id aya_collection_translated_bjn
python -m tests.test_seacrowd seacrowd/sea_datasets/aya_collection_translated/aya_collection_translated.py --subset_id aya_collection_translated_khm
python -m tests.test_seacrowd seacrowd/sea_datasets/aya_collection_translated/aya_collection_translated.py --subset_id aya_collection_translated_lao
python -m tests.test_seacrowd seacrowd/sea_datasets/aya_collection_translated/aya_collection_translated.py --subset_id aya_collection_translated_min

Checkbox

muhammadravi251001 commented 1 month ago

@muhammadravi251001 : LGTM!

Thanks for the approval, Sir!

muhammadravi251001 commented 1 month ago

Great work, @muhammadravi251001! Thanks a lot!!!

Thanks for the approval, kak!