SEACrowd / seacrowd-datahub

A collaborative project to collect datasets in SEA languages, SEA regions, or SEA cultures.
Apache License 2.0
60 stars 56 forks source link

Closes #32 (High Prio) | Extend XL Sum Dataloaders to SEA Langs #498

Closed sabilmakbar closed 5 months ago

sabilmakbar commented 5 months ago

Please name your PR title and the first line of PR message after the issue it will close. You can use the following examples:

Closes #32

Checkbox

sabilmakbar commented 5 months ago

Test log on all available languages (all passed) test_xlsum.log

sabilmakbar commented 5 months ago

Hello, I got these errors when I tried to run the test, can you check?

Or is there any specific command do you use for the testing this code?

It's because I don't implement xl_sum_source and xl_sum_seacrowd_t2t due to multiple subsets required (multilingual). You may try using an optional args called --subset_id when testing it, it should be like this:

python -m tests.test_seacrowd seacrowd/sea_datasets/xl_sum/xl_sum.py --subset_id xl_sum_ind 
# it will test both xl_sum_ind_source and xl_sum_seacrowd_t2t

and you can do the similar for other langs (mya, tha, and vie)