The Machine Translation from One Book (MTOB) dataset is drawn entirely from Visser (2022), a collection of documentation for the Kalamang language based on 11 months of fieldwork conducted in Mas over the course of four years. It consists of three sets of resources: (1) the body of the grammar book, (2) a bilingual wordlist, and (3) an extremely small corpus of parallel Kalamang-English sentences.
Subsets
Grammar Book, Bilingual Wordlist, Parallel Sentence Corpus
Languages
kgv, eng
Tasks
Machine Translation, Language Modeling, Word lists
Dataloader name:
mtob/mtob.py
DataCatalogue: http://seacrowd.github.io/seacrowd-catalogue/card.html?mtob