[x] Confirm that this PR is linked to the dataset issue.
[x] Create the dataloader script hub/hub_repos/my_dataset/my_dataset.py (please use only lowercase and underscore for dataset naming).
[x] Provide values for the _CITATION, _DATASETNAME, _DESCRIPTION, _HOMEPAGE, _LICENSE, _URLs, _SUPPORTED_TASKS, _SOURCE_VERSION, and _BIGBIO_VERSION variables.
[x] Implement _info(), _split_generators() and _generate_examples() in dataloader script.
[x] Make sure that the BUILDER_CONFIGS class attribute is a list with at least one BigBioConfig for the source schema and one for a bigbio schema.
[x] Confirm dataloader script works with datasets.load_dataset function.
[x] Confirm that your dataloader script passes the test suite run with python -m tests.test_bigbio_hub <dataset_name> [--data_dir /path/to/local/data] --test_local.
[ ] If my dataset is local, I have provided an output of the unit-tests in the PR (please copy paste). This is OPTIONAL for public datasets, as we can test these without access to the data files.
Name: MedQA (
med_qa
)Description: Multiple choice medical board questions (in this PR, adding 4option subsets)
Paper: https://arxiv.org/abs/2009.13081
Data: https://github.com/jind11/MedQA
Checkbox
[x] Confirm that this PR is linked to the dataset issue.
[x] Create the dataloader script
hub/hub_repos/my_dataset/my_dataset.py
(please use only lowercase and underscore for dataset naming).[x] Provide values for the
_CITATION
,_DATASETNAME
,_DESCRIPTION
,_HOMEPAGE
,_LICENSE
,_URLs
,_SUPPORTED_TASKS
,_SOURCE_VERSION
, and_BIGBIO_VERSION
variables.[x] Implement
_info()
,_split_generators()
and_generate_examples()
in dataloader script.[x] Make sure that the
BUILDER_CONFIGS
class attribute is a list with at least oneBigBioConfig
for the source schema and one for a bigbio schema.[x] Confirm dataloader script works with
datasets.load_dataset
function.[x] Confirm that your dataloader script passes the test suite run with
python -m tests.test_bigbio_hub <dataset_name> [--data_dir /path/to/local/data] --test_local
.[ ] If my dataset is local, I have provided an output of the unit-tests in the PR (please copy paste). This is OPTIONAL for public datasets, as we can test these without access to the data files.