SEACrowd / seacrowd-datahub

A collaborative project to collect datasets in SEA languages, SEA regions, or SEA cultures.
Apache License 2.0
66 stars 58 forks source link

Create dataset loader for Okapi m-TruthfulQA #477

Open SamuelCahyawijaya opened 8 months ago

SamuelCahyawijaya commented 8 months ago

Dataloader name: okapi_m_truthfulqa/okapi_m_truthfulqa.py DataCatalogue: http://seacrowd.github.io/seacrowd-catalogue/card.html?okapi_m_truthfulqa

Dataset okapi_m_truthfulqa
Description m-TruthfulQA is a multi-lingual version of TruthfulQA, a benchmark to measure whether a language model is truthful in generating answers to questions. The benchmark comprises 817 questions that span 38 categories, including health, law, finance and politics.
Subsets -
Languages ind, vie
Tasks Question Answering
License Creative Commons Attribution Non Commercial 4.0 (cc-by-nc-4.0)
Homepage http://nlp.uoregon.edu/download/okapi-eval/datasets/
HF URL https://huggingface.co/datasets/jon-tow/okapi_truthfulqa
Paper URL https://arxiv.org/abs/2307.16039
tellarin commented 8 months ago

self-assign

tellarin commented 7 months ago

Back working on this. Sorry for the delay.