SEACrowd / seacrowd-datahub

A collaborative project to collect datasets in SEA languages, SEA regions, or SEA cultures.
Apache License 2.0
55 stars 54 forks source link

Closes #617 | Add Dataloader SQuAD-ID-NLI #633

Closed muhammadravi251001 closed 2 months ago

muhammadravi251001 commented 3 months ago

Title: Add Dataloader SQuAD-ID-NLI

First line PR Message: Closes https://github.com/SEACrowd/seacrowd-datahub/issues/617

Notes

Checkbox

muhammadravi251001 commented 2 months ago

@muhammadravi251001 Checked, LGTM. Thank you for great work, just small issue like in previous one: need to delete comment

Thanks for the review, Sir!

muhammadravi251001 commented 2 months ago

Hi @muhammadravi251001, thanks for your hard work! The dataloader works well on my end. Just confirming, out of curiosity, I checked the label distribution per split and found that none of the data instances is labeled as "neutral" (i.e., 1).

train {0: 118445, 1: 0, 2: 118445}
validation {0: 11874, 1: 0, 2: 11874}
test {0: 11873, 1: 0, 2: 11873}

Is this intentional? If it is, I'll proceed with the merge.

Hi, Ms. Holy.

Yes, it was intentional because my model tries to do binary classification (entailment or contradiction), to get rid of the "gray characteristic" of neutral, it is also to tell the QA model (for my research) to avoid low-confidence answer because of the "gray characteristic" of neutral.

Even though, neutral is still needed in my NLI dataset, like this dataset.

holylovenia commented 2 months ago

Hi @muhammadravi251001, thanks for your hard work! The dataloader works well on my end. Just confirming, out of curiosity, I checked the label distribution per split and found that none of the data instances is labeled as "neutral" (i.e., 1).

train {0: 118445, 1: 0, 2: 118445}
validation {0: 11874, 1: 0, 2: 11874}
test {0: 11873, 1: 0, 2: 11873}

Is this intentional? If it is, I'll proceed with the merge.

Hi, Ms. Holy.

Yes, it was intentional because my model tries to do binary classification (entailment or contradiction), to get rid of the "gray characteristic" of neutral, it is also to tell the QA model (for my research) to avoid low-confidence answer because of the "gray characteristic" of neutral.

Even though, neutral is still needed in my NLI dataset, like this dataset.

Thanks for the clarification, @muhammadravi251001! Merging now.

PS: No need to call me "Ms.", no worries. 😂