SEACrowd / seacrowd-datahub

A collaborative project to collect datasets in SEA languages, SEA regions, or SEA cultures.
Apache License 2.0
65 stars 57 forks source link

Create dataset loader for KDE4 #365

Closed SamuelCahyawijaya closed 6 months ago

SamuelCahyawijaya commented 8 months ago

Dataloader name: kde4/kde4.py DataCatalogue: http://seacrowd.github.io/seacrowd-catalogue/card.html?kde4

Dataset kde4
Description A parallel corpus of KDE4 localization files. The corpus is available in 92 languages in total, with 4099 bitexts.
Subsets -
Languages ind, khm, zlm, tha, vie
Tasks Machine Translation
License Unknown (unknown)
Homepage https://opus.nlpl.eu/KDE4.php
HF URL -
Paper URL http://www.lrec-conf.org/proceedings/lrec2012/pdf/463_Paper.pdf
ssun32 commented 8 months ago

self-assign