SamuelCahyawijaya commented 5 months ago

Dataloader name: gnome/gnome.py DataCatalogue: http://seacrowd.github.io/seacrowd-catalogue/card.html?gnome

Dataset	gnome
Description	A parallel corpus of GNOME localization files, which contains the interface text in the GNU Network Object Model Environment (GNOME) and published by GNOME translation teams. Text in this dataset is relatively short and technical.
Subsets	-
Languages	eng, vie, mya, ind, tha, tgl, zlm, lao
Tasks	Machine Translation
License	Unknown (unknown)
Homepage	https://opus.nlpl.eu/GNOME/corpus/version/GNOME
HF URL	-
Paper URL	https://aclanthology.org/L12-1246/

akhdanfadh commented 5 months ago

Hey, from what I understand, there is no source language for this dataset. Should I make all possible translation pairs with all languages listed here?

EDIT: Based on discussion #456, I'll implement all possible language pairs.

For parallel MT dataloaders, we agreed upon having a subset for every possible direction with at least 1 SEA language.

akhdanfadh commented 5 months ago

SEACrowd / seacrowd-datahub

Create dataset loader for GNOME #513

self-assign