SEACrowd / seacrowd-datahub

A collaborative project to collect datasets in SEA languages, SEA regions, or SEA cultures.
Apache License 2.0
59 stars 55 forks source link

Create dataset loader for chatgpt-malaysian-open-qa #532

Closed SamuelCahyawijaya closed 4 months ago

SamuelCahyawijaya commented 5 months ago

Dataloader name: chatgpt_malaysian_open_qa/chatgpt_malaysian_open_qa.py DataCatalogue: http://seacrowd.github.io/seacrowd-catalogue/card.html?chatgpt_malaysian_open_qa

Dataset chatgpt_malaysian_open_qa
Description Synthetic Malaysian Open QA. Generated using ChatGPT3.5 on MS Wikipedia, MS Common Crawl and Malaysia Hansard - common-crawl-qa.jsonl, 69829 rows - hansard-qa.jsonl, 42368 rows - wikipedia-qa.jsonl, 44923 rows
Subsets Common Crawl QA, hansard-qa.jsonl, wikipedia-qa.jsonl
Languages zlm
Tasks Question Answering
License Creative Commons Attribution Non Commercial 2.0 (cc-by-nc-2.0)
Homepage https://huggingface.co/datasets/mesolitica/chatgpt-malaysian-open-qa
HF URL https://huggingface.co/datasets/mesolitica/chatgpt-malaysian-open-qa
Paper URL -
raileymontalan commented 5 months ago

self-assign