SEACrowd / seacrowd-datahub

A collaborative project to collect datasets in SEA languages, SEA regions, or SEA cultures.
Apache License 2.0
55 stars 54 forks source link

Create dataset loader for IDK-MRC-NLI #615

Closed SamuelCahyawijaya closed 1 month ago

SamuelCahyawijaya commented 3 months ago

Dataloader name: idk_mrc_nli/idk_mrc_nli.py DataCatalogue: http://seacrowd.github.io/seacrowd-catalogue/card.html?idk_mrc_nli

Dataset idk_mrc_nli
Description The IDK-MRC-NLI dataset is derived from the IDK-MRC question answering dataset, utilizing named entity recognition (NER), chunking tags, regex, and embedding similarity techniques to determine its contradiction sets. Collected through this process, the dataset comprises various columns beyond premise, hypothesis, and label, including properties aligned with NER and chunking tags. This dataset is designed to facilitate Natural Language Inference (NLI) tasks and contains information extracted from diverse sources to provide comprehensive coverage. Each data instance encapsulates premise, hypothesis, label, and additional properties pertinent to NLI evaluation.
Subsets Indonesian
Languages ind
Tasks Natural Language Inference
License Creative Commons Attribution Share Alike 4.0 (cc-by-sa-4.0)
Homepage https://huggingface.co/datasets/muhammadravi251001/idkmrc-nli
HF URL https://huggingface.co/datasets/muhammadravi251001/idkmrc-nli
Paper URL -