SEACrowd / seacrowd-datahub

A collaborative project to collect datasets in SEA languages, SEA regions, or SEA cultures.
Apache License 2.0
64 stars 57 forks source link

Closes #425 | Add Dataloader MaXM #554

Closed akhdanfadh closed 4 months ago

akhdanfadh commented 6 months ago

Closes #425

There is no subset specified in the homepage, but there are two files for one language: (1) regular QA, and (2) yes-no QA. I assumed each should be a subset (open to discuss). Thus, configs will look like this: maxm_regular_source, maxm_yesno_seacrowd_imqa, etc. When testing, pass maxm_<subset> to the --subset_id parameter.

Checkbox

akhdanfadh commented 5 months ago

I think there is a better way to indicate whether to use the "source" schema or the "imqa" schema instead of doing string concatenation. I advise to do a simple conditioning instead.

@faridlazuarda I don't understand the problem here. Are you sure you pass either maxm_regular or maxm_yesno to the --subset_id parameter for testing?

Edit:

ValueError: BuilderConfig 'maxm_regular_imqa_source' not found.

The problem is here, you pass maxm_regular_imqa instead of above I mentioned.