Closed muhammadravi251001 closed 1 month ago
2 more comments:
meta
field for SEACrowd QA? Prob you might want to pull the latest QA schema first to fix itcomplex
subset data, it has tipe
key (prob type
key in Indonesian), which has 4 distinct values for train
(image 1) and 5 for test
(image 2)
Would you like to add it to the meta
field as well for SEACrowd and introduce it as a new feature for the complex
source schema?
A quick comment: I saw some inconsistencies when addressing single/simple subset names. Would you mind amending this first, @muhammadravi251001? Thanks!
This is done by changing the single
subset name to simple
. From the paper & dataset itself, the author uses single
and simple
interchangeably -> single
for the dataset name and simple
for the explanation on the paper. Done on https://github.com/SEACrowd/seacrowd-datahub/pull/641/commits/ef563f67f22fd667e440cfc785b4f1cb230ef94a commit.
- Did you happen to not implementing the
meta
field for SEACrowd QA? Prob you might want to pull the latest QA schema first to fix it
I've already added the meta
field on the https://github.com/SEACrowd/seacrowd-datahub/pull/641/commits/735cd2e5b77a51f29fe46cc5dd2740478bc0e99f commit.
- I saw in
complex
subset data, it hastipe
key (probtype
key in Indonesian), which has 4 distinct values fortrain
(image 1) and 5 fortest
(image 2)
![]()
Would you like to add it to the
meta
field as well for SEACrowd and introduce it as a new feature for thecomplex
source schema?
Alright, done in https://github.com/SEACrowd/seacrowd-datahub/pull/641/commits/f2aec29b6fe595cab840751c3db8ecbe5e8bfe8c commit.
Thanks for the careful review, Sir!
lgtm
Thanks for the approval, Sir!
Hi! No comment from me👍 lgtm
Thanks for the approval, Sir!
Since both reviewers approved, I will continue to squash & merge this PR. Thanks for the review!
Title: Add Dataloader AC-IQuAD
First line PR Message: Closes https://github.com/SEACrowd/seacrowd-datahub/issues/612
Checkbox
seacrowd/sea_datasets/{my_dataset}/{my_dataset}.py
(please use only lowercase and underscore for dataset folder naming, as mentioned in dataset issue) and its__init__.py
within{my_dataset}
folder._CITATION
,_DATASETNAME
,_DESCRIPTION
,_HOMEPAGE
,_LICENSE
,_LOCAL
,_URLs
,_SUPPORTED_TASKS
,_SOURCE_VERSION
, and_SEACROWD_VERSION
variables._info()
,_split_generators()
and_generate_examples()
in dataloader script.BUILDER_CONFIGS
class attribute is a list with at least oneSEACrowdConfig
for the source schema and one for a seacrowd schema.datasets.load_dataset
function.python -m tests.test_seacrowd seacrowd/sea_datasets/<my_dataset>/<my_dataset>.py
orpython -m tests.test_seacrowd seacrowd/sea_datasets/<my_dataset>/<my_dataset>.py --subset_id {subset_name_without_source_or_seacrowd_suffix}
.