Create dataset loader for NLLB Seed

NusaCatalogue: https://indonlp.github.io/nusa-catalogue/card.html?nllb_seed

Dataset	nllb_seed
Description	NLLB Seed is a set of professionally-translated sentences in the Wikipedia domain. Data for NLLB-Seed was sampled from Wikimedia’s List of articles every Wikipedia should have, a collection of topics in different fields of knowledge and human activity. NLLB-Seed consists of around six thousand sentences in 39 languages. NLLB-Seed is meant to be used for training rather than model evaluation. Due to this difference, NLLB-Seed does not go through the human quality assurance process present in FLORES-200.
License	CC-BY-NC 4.0

Dataset

nllb_seed

Description

NLLB Seed is a set of professionally-translated sentences in the Wikipedia domain. Data for NLLB-Seed was sampled from Wikimedia’s List of articles every Wikipedia should have, a collection of topics in different fields of knowledge and human activity. NLLB-Seed consists of around six thousand sentences in 39 languages. NLLB-Seed is meant to be used for training rather than model evaluation. Due to this difference, NLLB-Seed does not go through the human quality assurance process present in FLORES-200.

License

CC-BY-NC 4.0

IndoNLP / nusa-crowd

Create dataset loader for NLLB Seed #243

self-assign