Closes #6 | Add Loader for XCOPA

FawwazMayda commented 9 months ago

Closes #6 | Extend dataloader for XCOPA to th (Thai) and vi (Vietnamese)

Checkbox

[x] Confirm that this PR is linked to the dataset issue.
[x] Create the dataloader script seacrowd/sea_datasets/my_dataset/my_dataset.py (please use only lowercase and underscore for dataset naming).
[ ] Provide values for the _CITATION, _DATASETNAME, _DESCRIPTION, _HOMEPAGE, _LICENSE, _URLs, _SUPPORTED_TASKS, _SOURCE_VERSION, and _SEACROWD_VERSION variables.
[x] Implement _info(), _split_generators() and _generate_examples() in dataloader script.
[x] Make sure that the BUILDER_CONFIGS class attribute is a list with at least one SEACrowdConfig for the source schema and one for a seacrowd schema.
[x] Confirm dataloader script works with datasets.load_dataset function.
[x] Confirm that your dataloader script passes the test suite run with python -m tests.test_seacrowd seacrowd/sea_datasets/<my_dataset>/<my_dataset>.py.
[ ] If my dataset is local, I have provided an output of the unit-tests in the PR (please copy paste). This is OPTIONAL for public datasets, as we can test these without access to the data files.

Ind

Source

QA

Th

Source

QA

Vie

Source

QA

sabilmakbar commented 9 months ago

Hi @FawwazMayda, thanks for contributing on this dataloader!

May I know why Indonesian language is being skipped? Both the datacard and the source dataset have id as one of the supported languages, tho.

FawwazMayda commented 9 months ago

Currently it already has id in seacrowd/xcopa so because its already implemented that is why I left with implementing th and vie

Should I rename it to xcopa_id instead of xcopa

sabilmakbar commented 9 months ago

In that case, would you extend the script on sea_datasets/xcopa to cover tha and vie aswell? (Consequently removing xcopa_th and xcopa_vi). This shd be similar to this previously-merged PR, you may refer the implementation to here: https://github.com/SEACrowd/seacrowd-datahub/pull/125

FawwazMayda commented 8 months ago

updated @sabilmakbar I added all those languages into single xcopa.py

FawwazMayda commented 8 months ago

Adjusted based on your comments @holylovenia

sabilmakbar commented 8 months ago

Hi @FawwazMayda, can you try to run the formatter check and fix it accordingly? I found some import & readability suggestions found like traling whitespace in a line, unused imports, fix reference of internal var in config creation (l to lang)

make check_file=seacrowd/sea_datasets/xcopa/xcopa.py

FawwazMayda commented 8 months ago

updated @sabilmakbar

SEACrowd / seacrowd-datahub