SEACrowd / seacrowd-datahub

A collaborative project to collect datasets in SEA languages, SEA regions, or SEA cultures.
Apache License 2.0
64 stars 57 forks source link

Closes #583 | Add Dataloader multilingual-NLI-26lang-2mil7 #598

Closed akhdanfadh closed 5 months ago

akhdanfadh commented 6 months ago

Closes #583

There are 10 subsets. Configs will look like this: multilingual_nli_26lang_id_anli_source, multilingual_nli_26lang_vi_ling_seacrowd_pairs, etc. When testing, pass multilingual_nli_26lang_<subset> to the --subset_id parameter.

Here is a useful script to test all subsets: To run this script, save it to a file (e.g., `mnli_tests.sh`), make it executable with `chmod +x mnli_tests.sh`, and execute it with `./mnli_tests.sh`. Ensure you run the script from the seacrowd root directory. ```bash #!/bin/bash DATASET="multilingual_nli_26lang" LANGS=("id" "vi") SUBSETS=("anli" "fever" "ling" "mnli" "wanli") mkdir -p data/${DATASET} success_count=0 fail_count=0 declare -a failed_tests for lang in "${LANGS[@]}"; do for subset in "${SUBSETS[@]}"; do subset_id="${lang}_${subset}" python_command="python -m tests.test_seacrowd seacrowd/sea_datasets/${DATASET}/${DATASET}.py --subset_id=${DATASET}_${subset_id}" output_file="data/${DATASET}/${subset_id}.txt" temp_output_file="data/${DATASET}/${subset_id}_temp.txt" # for cleaner cli output echo "Testing subset id: $subset_id" # run the test, save the output, and redirect verbose output to a temporary file script -q -c "$python_command" "$temp_output_file" > /dev/null cat "$temp_output_file" > "$output_file" rm "$temp_output_file" # check if the test was successful if grep -q "OK" "$output_file"; then echo "Test for $subset_id: SUCCESS" ((success_count++)) else echo "Test for $subset_id: FAILURE" failed_tests+=("$subset_id") ((fail_count++)) fi done done echo "-----------------------" echo "SUMMARY: $((success_count + fail_count)) tests total" echo "Success: $success_count" echo "Failure: $fail_count" if [ ${#failed_tests[@]} -gt 0 ]; then echo "Failed tests:" for test in "${failed_tests[@]}"; do echo "- $test" done fi ```

Checkbox

akhdanfadh commented 5 months ago

@holylovenia Done!

yongzx commented 5 months ago

Everything runs on my end as well. I will merge this