SEACrowd / seacrowd-datahub

A collaborative project to collect datasets in SEA languages, SEA regions, or SEA cultures.
Apache License 2.0
65 stars 57 forks source link

Create dataset loader for IndonesianNMT #338

Closed SamuelCahyawijaya closed 7 months ago

SamuelCahyawijaya commented 8 months ago

Dataloader name: indonesiannmt/indonesiannmt.py DataCatalogue: http://seacrowd.github.io/seacrowd-catalogue/card.html?indonesiannmt

Dataset indonesiannmt
Description Repository containing datasets for data automatically generated from gpt-3.5-turbo and text-davinci-003 used in the work "Replicable Benchmarking of Neural Machine Translation (NMT) on Low-Resource Local Languages in Indonesia"
Subsets -
Languages ind, jav, min, sun, ban
Tasks Machine Translation
License Unknown (unknown)
Homepage https://github.com/luckysusanto/IndonesianNMT
HF URL -
Paper URL https://arxiv.org/abs/2311.00998
luckysusanto commented 8 months ago

self-assign