Closed akhdanfadh closed 4 months ago
I think there is a better way to indicate whether to use the "source" schema or the "imqa" schema instead of doing string concatenation. I advise to do a simple conditioning instead.
@faridlazuarda I don't understand the problem here. Are you sure you pass either maxm_regular
or maxm_yesno
to the --subset_id
parameter for testing?
Edit:
ValueError: BuilderConfig 'maxm_regular_imqa_source' not found.
The problem is here, you pass maxm_regular_imqa
instead of above I mentioned.
Closes #425
There is no subset specified in the homepage, but there are two files for one language: (1) regular QA, and (2) yes-no QA. I assumed each should be a subset (open to discuss). Thus, configs will look like this:
maxm_regular_source
,maxm_yesno_seacrowd_imqa
, etc. When testing, passmaxm_<subset>
to the--subset_id
parameter.Checkbox
seacrowd/sea_datasets/{my_dataset}/{my_dataset}.py
(please use only lowercase and underscore for dataset folder naming, as mentioned in dataset issue) and its__init__.py
within{my_dataset}
folder._CITATION
,_DATASETNAME
,_DESCRIPTION
,_HOMEPAGE
,_LICENSE
,_LOCAL
,_URLs
,_SUPPORTED_TASKS
,_SOURCE_VERSION
, and_SEACROWD_VERSION
variables._info()
,_split_generators()
and_generate_examples()
in dataloader script.BUILDER_CONFIGS
class attribute is a list with at least oneSEACrowdConfig
for the source schema and one for a seacrowd schema.datasets.load_dataset
function.python -m tests.test_seacrowd seacrowd/sea_datasets/<my_dataset>/<my_dataset>.py
orpython -m tests.test_seacrowd seacrowd/sea_datasets/<my_dataset>/<my_dataset>.py --subset_id {subset_name_without_source_or_seacrowd_suffix}
.