SEACrowd / seacrowd-datahub

A collaborative project to collect datasets in SEA languages, SEA regions, or SEA cultures.
Apache License 2.0
55 stars 54 forks source link

Create dataset loader for SQuAD-ID-NLI #617

Closed SamuelCahyawijaya closed 2 months ago

SamuelCahyawijaya commented 3 months ago

Dataloader name: squad_id_nli/squad_id_nli.py DataCatalogue: http://seacrowd.github.io/seacrowd-catalogue/card.html?squad_id_nli

Dataset squad_id_nli
Description The SQuAD-ID-NLI dataset is derived from the SQuAD-ID question answering dataset, utilizing named entity recognition (NER), chunking tags, regex, and embedding similarity techniques to determine its contradiction sets. Collected through this process, the dataset comprises various columns beyond premise, hypothesis, and label, including properties aligned with NER and chunking tags. This dataset is designed to facilitate Natural Language Inference (NLI) tasks and contains information extracted from diverse sources to provide comprehensive coverage. Each data instance encapsulates premise, hypothesis, label, and additional properties pertinent to NLI evaluation.
Subsets Indonesian
Languages ind
Tasks Natural Language Inference
License Unknown (unknown)
Homepage https://huggingface.co/datasets/muhammadravi251001/squadid-nli
HF URL https://huggingface.co/datasets/muhammadravi251001/squadid-nli
Paper URL -