3000+ machine-readable open source dictionaries distributed by the Applied Computational Linguistics lab at the University of Augsburg, Germany, and by the research group Linked Open Dictionaries (LiODi, funded 2015-2020 by BMBF at Goethe University Frankfurt, Germany). All data provided in OntoLex-Lemon and TIAD-TSV.
Provide for every dataset (stable and experimental) a file langs.tsv and a file lang-pairs.tsv in the root directory of the data set.
Use the following structure:
langs.tsv:
TAG<TAB>FILE<TAB>ENTRIES<TAB>LICENSE
TAG: primary BCP47 language tag, omitting subtags, e.g., en for en-US, etc.
FILE: OntoLex RDF file, can be in a (zip or other) archive. A file within an archive should be separated from the archive path with :
ENTRIES: number of lexical entries (i.e., number of lexical entry URIs)
LICENSE: license acronym
example:
en ontolex/archive.zip:en/dict1.ttl 10000 CC-BY 4.0
Note that multiple dictionaries per language variety can exist.
SRC: source language tag (see TAG above)
TGT: target language tag (see TAG below)
FILE: TIAD-TSV file (see FILE above)
ROWS: number of rows in FILE, i.e., translation pairs. FILE must not contain duplicates.
SOURCES: one or multiple source files, should correspond with langs.tsv FILE entries such that the license can be recovered
Provide for every dataset (stable and experimental) a file langs.tsv and a file lang-pairs.tsv in the root directory of the data set.
Use the following structure:
langs.tsv: TAG<TAB>FILE<TAB>ENTRIES<TAB>LICENSE
TAG: primary BCP47 language tag, omitting subtags, e.g., en for en-US, etc. FILE: OntoLex RDF file, can be in a (zip or other) archive. A file within an archive should be separated from the archive path with : ENTRIES: number of lexical entries (i.e., number of lexical entry URIs) LICENSE: license acronym
example:
Note that multiple dictionaries per language variety can exist.
lang-pairs.tsv: SRC<TAB>TGT<TAB>FILE<TAB>ROWS<TAB>SOURCES
SRC: source language tag (see TAG above) TGT: target language tag (see TAG below) FILE: TIAD-TSV file (see FILE above) ROWS: number of rows in FILE, i.e., translation pairs. FILE must not contain duplicates. SOURCES: one or multiple source files, should correspond with langs.tsv FILE entries such that the license can be recovered