Closed muhammadravi251001 closed 2 months ago
@muhammadravi251001 Checked, LGTM. Thank you for great work, just small issue like in previous one: need to delete comment
Thanks for the review, Sir!
Hi @muhammadravi251001, thanks for your hard work! The dataloader works well on my end. Just confirming, out of curiosity, I checked the label distribution per split and found that none of the data instances is labeled as "neutral" (i.e.,
1
).train {0: 118445, 1: 0, 2: 118445} validation {0: 11874, 1: 0, 2: 11874} test {0: 11873, 1: 0, 2: 11873}
Is this intentional? If it is, I'll proceed with the merge.
Hi, Ms. Holy.
Yes, it was intentional because my model tries to do binary classification (entailment
or contradiction
), to get rid of the "gray characteristic" of neutral
, it is also to tell the QA model (for my research) to avoid low-confidence answer because of the "gray characteristic" of neutral
.
Even though, neutral
is still needed in my NLI dataset, like this dataset.
Hi @muhammadravi251001, thanks for your hard work! The dataloader works well on my end. Just confirming, out of curiosity, I checked the label distribution per split and found that none of the data instances is labeled as "neutral" (i.e.,
1
).train {0: 118445, 1: 0, 2: 118445} validation {0: 11874, 1: 0, 2: 11874} test {0: 11873, 1: 0, 2: 11873}
Is this intentional? If it is, I'll proceed with the merge.
Hi, Ms. Holy.
Yes, it was intentional because my model tries to do binary classification (
entailment
orcontradiction
), to get rid of the "gray characteristic" ofneutral
, it is also to tell the QA model (for my research) to avoid low-confidence answer because of the "gray characteristic" ofneutral
.Even though,
neutral
is still needed in my NLI dataset, like this dataset.
Thanks for the clarification, @muhammadravi251001! Merging now.
PS: No need to call me "Ms.", no worries. 😂
Title: Add Dataloader SQuAD-ID-NLI
First line PR Message: Closes https://github.com/SEACrowd/seacrowd-datahub/issues/617
Notes
_CITATION
field, because of the notification of my workshop on 18 April, I still can't write that section. On 18 April, I will revisit and change this_CITATION
.Checkbox
seacrowd/sea_datasets/{my_dataset}/{my_dataset}.py
(please use only lowercase and underscore for dataset folder naming, as mentioned in dataset issue) and its__init__.py
within{my_dataset}
folder._DATASETNAME
,_DESCRIPTION
,_HOMEPAGE
,_LICENSE
,_LOCAL
,_URLs
,_SUPPORTED_TASKS
,_SOURCE_VERSION
, and_SEACROWD_VERSION
variables._info()
,_split_generators()
and_generate_examples()
in dataloader script.BUILDER_CONFIGS
class attribute is a list with at least oneSEACrowdConfig
for the source schema and one for a seacrowd schema.datasets.load_dataset
function.python -m tests.test_seacrowd seacrowd/sea_datasets/<my_dataset>/<my_dataset>.py
orpython -m tests.test_seacrowd seacrowd/sea_datasets/<my_dataset>/<my_dataset>.py --subset_id {subset_name_without_source_or_seacrowd_suffix}
.