Closes #623 | Add/Update Dataloader MedEV

patrickamadeus commented 2 months ago

Closes #623

Checkbox

[x] Confirm that this PR is linked to the dataset issue.
[x] Create the dataloader script seacrowd/sea_datasets/{my_dataset}/{my_dataset}.py (please use only lowercase and underscore for dataset folder naming, as mentioned in dataset issue) and its __init__.py within {my_dataset} folder.
[x] Provide values for the _CITATION, _DATASETNAME, _DESCRIPTION, _HOMEPAGE, _LICENSE, _LOCAL, _URLs, _SUPPORTED_TASKS, _SOURCE_VERSION, and _SEACROWD_VERSION variables.
[x] Implement _info(), _split_generators() and _generate_examples() in dataloader script.
[x] Make sure that the BUILDER_CONFIGS class attribute is a list with at least one SEACrowdConfig for the source schema and one for a seacrowd schema.
[x] Confirm dataloader script works with datasets.load_dataset function.
[x] Confirm that your dataloader script passes the test suite run with python -m tests.test_seacrowd seacrowd/sea_datasets/<my_dataset>/<my_dataset>.py or python -m tests.test_seacrowd seacrowd/sea_datasets/<my_dataset>/<my_dataset>.py --subset_id {subset_name_without_source_or_seacrowd_suffix}.
[.] If my dataset is local, I have provided an output of the unit-tests in the PR (please copy paste). This is OPTIONAL for public datasets, as we can test these without access to the data files.

Tests

patrickamadeus commented 1 month ago

Hi @elyanah-aco ! I've addressed all of the suggestions! Appreciate the detailed review.

I will address suggestion from @akhdanfadh after your second opinion 🙏.

holylovenia commented 1 month ago

Also adding to this, do we really want to not match the English text and Vietnamese translation together? I know that the dataset viewer in the homepage shows the data in a stack, but I think for a dataloader, we should add them together. Wdyt @elyanah-aco?

Hi @elyanah-aco ! I've addressed all of the suggestions! Appreciate the detailed review.

I will address suggestion from @akhdanfadh after your second opinion 🙏.

A friendly reminder for @elyanah-aco in case she missed it.

patrickamadeus commented 1 month ago

Hi all @akhdanfadh @elyanah-aco ! The minor language expand is done! Thank you for all of the reviews. 🙏

holylovenia commented 1 month ago

Hi @akhdanfadh, I would like to let you know that we plan to finalize the calculation of the open contributions (e.g., dataloader implementations) in 31 hours, so it'd be great if we could wrap up the reviewing and merge this PR before then.

cc: @patrickamadeus

SEACrowd / seacrowd-datahub

Closes #623 | Add/Update Dataloader MedEV #639

Checkbox

Tests