SEACrowd / seacrowd-datahub

A collaborative project to collect datasets in SEA languages, SEA regions, or SEA cultures.
Apache License 2.0
60 stars 56 forks source link

Create dataset loader for Duolingo STAPLE 2020 #522

Closed SamuelCahyawijaya closed 3 months ago

SamuelCahyawijaya commented 5 months ago

Dataloader name: duolingo_staple_2020/duolingo_staple_2020.py DataCatalogue: http://seacrowd.github.io/seacrowd-catalogue/card.html?duolingo_staple_2020

Dataset duolingo_staple_2020
Description This dataset is provided by Duolingo for their Simultaneous Translation and Paraphrase for Language Education (STAPLE) shared task in 2020. It contains English prompts and corresponding sets of plausible translations in five other languages, including Vietnamese. Each prompt is provided with a baseline automatic reference translation from Amazon, as well as some accepted translations with corresponding user response rates used for task scoring.
Subsets aws_baseline, gold
Languages eng, vie
Tasks Machine Translation, Paraphrasing
License Creative Commons Attribution Non Commercial 4.0 (cc-by-nc-4.0)
Homepage https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/38OJR6
HF URL -
Paper URL https://aclanthology.org/2020.ngt-1.28.pdf
akhdanfadh commented 5 months ago

self-assign

sabilmakbar commented 4 months ago

can we change the homepage to this? https://sharedtask.duolingo.com/#data