IndoNLP / nusa-crowd

A collaborative project to collect datasets in Indonesian languages.
Apache License 2.0
261 stars 61 forks source link

Closes #290 | Adding two code mixed JV-ID datasets #300

Closed fozziethebeat closed 1 year ago

fozziethebeat commented 1 year ago

Checkbox

Tested with

python -m tests.test_nusantara nusacrowd/nusa_datasets/code_mixed_mt/code_mixed_mt.py --subset_id code_mixed_mt_jav_ind
python -m tests.test_nusantara nusacrowd/nusa_datasets/code_mixed_senti/code_mixed_senti.py --subset_id code_mixed_senti_jav
python -m tests.test_nusantara nusacrowd/nusa_datasets/code_mixed_senti/code_mixed_senti.py --subset_id code_mixed_senti_ind
fozziethebeat commented 1 year ago

I merged the loaders.

Can you give advice on the preferred config names?

With the merge, I used:

The loaders all work but the unittest library doesn't like this since there's no config name substring that includes the two tasks + source so any subset_id value ends up with some constructed config name being missing.

SamuelCahyawijaya commented 1 year ago

/test dataset=code_mixed_jv_id subset_id=code_mixed_jv_id_id

github-actions[bot] commented 1 year ago

Run result

Check test log here: https://github.com/IndoNLP/nusa-crowd/actions/runs/3160630960