SEACrowd / seacrowd-datahub

A collaborative project to collect datasets in SEA languages, SEA regions, or SEA cultures.
Apache License 2.0
60 stars 56 forks source link

PR for update subset composition of TydiQA | Close #465 #503

Closed Gyyz closed 5 months ago

Gyyz commented 5 months ago

Update subset composition of TydiQA | Close #465 Decompose the tasks into SelectP, MinSpan, GoldP tasks.

Scripts Passed:

  1. python -m tests.test_seacrowd seacrowd/sea_datasets/tydiqa/tydiqa.py --subset tydiqa_selectp_thai
  2. python -m tests.test_seacrowd seacrowd/sea_datasets/tydiqa/tydiqa.py --subset tydiqa_selectp_indonesian
  3. python -m tests.test_seacrowd seacrowd/sea_datasets/tydiqa/tydiqa.py --subset tydiqa_selectp
  4. python -m tests.test_seacrowd seacrowd/sea_datasets/tydiqa/tydiqa.py --subset tydiqa_id
  5. python -m tests.test_seacrowd seacrowd/sea_datasets/tydiqa/tydiqa.py --subset tydiqa_minspan
  6. python -m tests.test_seacrowd seacrowd/sea_datasets/tydiqa/tydiqa.py --subset tydiqa_minspan_thai
  7. python -m tests.test_seacrowd seacrowd/sea_datasets/tydiqa/tydiqa.py --subset tydiqa_minspan_indonesian
  8. python -m tests.test_seacrowd seacrowd/sea_datasets/tydiqa/tydiqa.py --subset tydiqa_goldp
  9. python -m tests.test_seacrowd seacrowd/sea_datasets/tydiqa/tydiqa.py --subset tydiqa_goldp_indonesian
  10. make check_file=seacrowd/sea_datasets/tydiqa/tydiqa.py
Gyyz commented 5 months ago

Hi @Gyyz, thanks for the modifications! It works well on my side. One last nit, could you please replace the "Indonesian" and "thai" with "ind" and "that"?

Hi, @holylovenia is the “that” a typo for “tha”?

I updated the config name (indonesian --> ind, thai --> tha), now available config list: ['tydiqa_selectp_source', 'tydiqa_selectp_ind_source', 'tydiqa_selectp_tha_source', 'tydiqa_minspan_source', 'tydiqa_minspan_ind_source', 'tydiqa_minspan_tha_source', 'tydiqa_goldp_source', 'tydiqa_goldp_ind_source', 'tydiqa_id_source', 'tydiqa_selectp_seacrowd_qa', 'tydiqa_selectp_ind_seacrowd_qa', 'tydiqa_selectp_tha_seacrowd_qa', 'tydiqa_minspan_seacrowd_qa', 'tydiqa_minspan_ind_seacrowd_qa', 'tydiqa_minspan_tha_seacrowd_qa', 'tydiqa_goldp_seacrowd_qa', 'tydiqa_goldp_ind_seacrowd_qa', 'tydiqa_id_seacrowd_qa']

holylovenia commented 5 months ago

Hi @Gyyz, thanks for the modifications! It works well on my side. One last nit, could you please replace the "Indonesian" and "thai" with "ind" and "that"?

Hi, @holylovenia is the “that” a typo for “tha”?

I updated the config name (indonesian --> ind, thai --> tha), now available config list: ['tydiqa_selectp_source', 'tydiqa_selectp_ind_source', 'tydiqa_selectp_tha_source', 'tydiqa_minspan_source', 'tydiqa_minspan_ind_source', 'tydiqa_minspan_tha_source', 'tydiqa_goldp_source', 'tydiqa_goldp_ind_source', 'tydiqa_id_source', 'tydiqa_selectp_seacrowd_qa', 'tydiqa_selectp_ind_seacrowd_qa', 'tydiqa_selectp_tha_seacrowd_qa', 'tydiqa_minspan_seacrowd_qa', 'tydiqa_minspan_ind_seacrowd_qa', 'tydiqa_minspan_tha_seacrowd_qa', 'tydiqa_goldp_seacrowd_qa', 'tydiqa_goldp_ind_seacrowd_qa', 'tydiqa_id_seacrowd_qa']

Oops, yes! Thanks for understanding my ambiguous suggestion. 😂