SEACrowd / seacrowd-datahub

A collaborative project to collect datasets in SEA languages, SEA regions, or SEA cultures.
Apache License 2.0
55 stars 54 forks source link

Create dataset loader for AC-IQuAD #612

Closed SamuelCahyawijaya closed 1 month ago

SamuelCahyawijaya commented 3 months ago

Dataloader name: ac_iquad/ac_iquad.py DataCatalogue: http://seacrowd.github.io/seacrowd-catalogue/card.html?ac_iquad

Dataset ac_iquad
Description This is an automatically-produced question answering dataset generated from Indonesian Wikipedia articles. Each entry in the dataset consists of a context paragraph, the question and answer, and the question's equivalent SPARQL query. Questions are separated into two subsets: simple (question consists of a single SPARQL triple pattern) and complex (question consists of two triples plus an optional typing triple).
Subsets simple, complex
Languages ind
Tasks Question Answering
License Creative Commons Attribution 4.0 (cc-by-4.0)
Homepage https://www.kaggle.com/datasets/realdeo/indonesian-qa-generated-by-kg
HF URL -
Paper URL https://link.springer.com/article/10.1007/s10579-023-09702-y
muhammadravi251001 commented 2 months ago

self-assign

sabilmakbar commented 1 month ago

I don't think this dataset is a CC-licensed dataset. The Kaggle URL indicates an unknown license, and the section on the paper that indicates CC [section Rights and permissions] refers to the article's license, not the dataset's license.

cc @holylovenia @muhammadravi251001

muhammadravi251001 commented 1 month ago

I don't think this dataset is a CC-licensed dataset. The Kaggle URL indicates an unknown license, and the section on the paper that indicates CC [section Rights and permissions] refers to the article's license, not the dataset's license.

cc @holylovenia @muhammadravi251001

I would ask the creator of the dataset/paper. Since he was my Senior & TA in my days as a college student back then.

muhammadravi251001 commented 1 month ago

I don't think this dataset is a CC-licensed dataset. The Kaggle URL indicates an unknown license, and the section on the paper that indicates CC [section Rights and permissions] refers to the article's license, not the dataset's license. cc @holylovenia @muhammadravi251001

I would ask the creator of the dataset/paper. Since he was my Senior & TA in my days as a college student back then.

The creator says the license is Creative Commons Attribution 4.0 (cc-by-4.0). This is my screen-capture bubble WhatsApp chat with him (excluding all of the chat before and after for privacy matters). image

CC. @holylovenia @sabilmakbar