elexis-eu/dictionary-service

ELEXIS Dictionary Service

This tool provides a simple way to host dictionaries that can be contributed to the ELEXIS infrastructure. This interface is the reference implementation of the REST API defined here:

https://elexis-eu.github.io/elexis-rest

Installation

From Source

This tool can be built with Rust/Cargo using the following command

cargo build --release

This will create a single binary at target/release/elexis-dictionary-service.

By Docker

The dictionary service is available from Docker Hub.

You can run the command with

docker run -it --rm -p 8000:8000 jmccrae/elexis-dictionary-service

Usage

The ELEXIS dictionary service supports a number of commands

Loading data

Data can be loaded with the load command

USAGE:
    elexis-dictionary-service load [FLAGS] [OPTIONS] <data>

FLAGS:
    -h, --help       Prints help information
    -V, --version    Prints version information

OPTIONS:
    -c, --config <config>                                    Configuration to help with mapping
        --db-path <db_path>                                  The path to use for the database (Default: eds.db)
    -f, --format <json|ttl|tei>                              The format of the input
        --genre <gen|lrn|ety|spe|his|ort|trm>                The genre(s) of the dataset (comma separated)
        --id <id>                                            The identifier of the dataset
        --release <PUBLIC|NONCOMMERCIAL|RESEARCH|PRIVATE>    The release level of the resource

ARGS:
    <data>    The data to host

For example to load a file it is normally sufficient to give a command as follows:

# A Json file
elexis-dictionary-service load example/example.json
# A TEI-Lex0 file
elexis-dictionary-service load example/example-tei.xml --id tei_dict --release PUBLIC
# An OntoLex file
elexis-dictionary-service load example/example.rdf --release PUBLIC

Starting the server

The REST server may be started with the start command:

Start the server

USAGE:
    elexis-dictionary-service start [FLAGS] [OPTIONS]

FLAGS:
    -h, --help       Prints help information
        --no-sql     Do not use SQLite (all data is temporary and session only)
    -V, --version    Prints version information

OPTIONS:
    -c, --config <config>                                    Configuration to help with mapping
    -d, --data <data>                                        Also load a single data file
        --db-path <db_path>                                  The path to use for the database (Default: eds.db)
    -f, --format <json|ttl|tei>                              The format of the input
        --genre <gen|lrn|ety|spe|his|ort|trm>                The genre(s) of the dataset (comma separated)
        --id <id>                                            The identifier of the dataset
    -p, --port <port>                                        The port to start the server on
        --release <PUBLIC|NONCOMMERCIAL|RESEARCH|PRIVATE>    The release level of the resource

For example to start a server

elexis-dictionary-service start

The server will be available at http://localhost:8000/

To start a temporary server for a single file (not using SQlite) the following command can be used

elexis-dictionary-service start -d example/example.json --no-sql

Deleting a dictionary

A dictionary may be removed from the server with the delete command

USAGE:
    elexis-dictionary-service delete [OPTIONS] [data]

FLAGS:
    -h, --help       Prints help information
    -V, --version    Prints version information

OPTIONS:
        --db-path <db_path>    The path to use for the database (Default: eds.db)

ARGS:
    <data>    Data file to delete

For example

elexis-dictionary-service delete dict_id

Formats

Json

The Json format consists of an object of the following form

{
    "dict_id": {
        "meta": {    },
        "entries: [    ]
    }
}

Where dict_id is the name of the dictionary, the meta value is exactly as would be returned by the about REST call. The entries value is an array where each element is as would be returned by the entry as Json REST call

TEI-Lex0

The TEI-Lex0 document should be a valid XML document with at least the following tags

<TEI xmlns="http://www.tei-c.org/ns/1.0">
    <teiHeader>
        <fileDesc>
            <titleStmt>
                <title>Name of the dictionary</author>
            </titleStmt>
            <publicationStmt>
                <publisher>Named of the publisher</publisher>
                <availability>
                    <licence target="http://url.of.licence">...</licence>
                </availability>
            </publicationStmt>
            <sourceDesc>
                <author>Name of the author</author>
            </sourceDesc>
        </fileDesc>
    </teiHeader>
    <body>
       <entry xml:lang="en" xml:id="test">
        <form type="lemma">
            <orth>girl</orth>
        </form>
        <form type="variant">
            <orth>girls</orth>
        </form>
        <gramGrp>
            <gram type="pos" norm="NOUN">noun</gram>
        </gramGrp>
        <sense>
            <def>young female</def>#
        </sense>
    </body>
</TEI>

The following constraints are required

A licence must be given with a target
An entry must have a form[@type=lemma]
An entry must have a gram[@type=pos] and it should have a norm referring to a UD category unless mapping is used (see below)
An entry must have a lang and a id
An entry must not occur within another entry

OntoLex

An OntoLex document should be a valid Turtle document such as follows:

@prefix lime: <http://www.w3.org/ns/lemon/lime#> .
@prefix ontolex: <http://www.w3.org/ns/lemon/ontolex#> .
@prefix dct: <http://purl.org/dc/terms/> .
@prefix foaf: <http://xmlns.com/foaf/0.1/> .
@prefix skos: <http://www.w3.org/2004/02/skos/core#> .
@prefix lexinfo: <http://www.lexinfo.net/ontology/2.0/lexinfo#> .

<#dictionary> a lime:Lexicon ;
    lime:language "en" ;
    dct:license <http://www.example.com/license> ;
    dct:description "A test resource" ;
    dct:creator [
        foaf:name "Joe Bloggs" ;
        foaf:mbox <mailto:test@example.com> ;
        foaf:homepage <http://www.example.com/>
    ] ;
    dct:publisher [
        foaf:name "Publisher"
    ] ;
    lime:entry <#entry1>, <#test> .

<#entry1> a ontolex:LexicalEntry ;
    lexinfo:partOfSpeech lexinfo:commonNoun ;
    ontolex:canonicalForm [
        ontolex:writtenRep "cat"@en 
    ] ;
    ontolex:sense [
        skos:definition "This is a definition"@en
    ] .

<#test>  a ontolex:LexicalEntry ;
    ontolex:canonicalForm [
        ontolex:writtenRep "dog"@en 
    ] ;
    ontolex:sense [
        ontolex:reference <http://www.example.com/ontology>  
    ] .

In order to process the file well, certain information should be grouped together, in particular all information about the lexicon should follow after the triple

<#dictionary> a lime:Lexicon

A dictionary must have a lime:language and a dct:license.

The entry starts with a triple of the form

<#entry1> a ontolex:LexicalEntry

All triples after this until another similar triple occurs in the file are considered the description of this entry.

All entries must have an ontolex:canonicalForm with an ontolex:writtenRep.

All entries must be given by URIs and referred to by a lime:entry triple from a lexicon

Configuration

Configuration maybe performed using a configuration file. This is particularly useful for providing mappings. An example configuration is as below

{
    "posProperty": "http://www.lexinfo.net/ontology/2.0/lexinfo#partOfSpeech",
    "posMapping": {
        "substantive": "NOUN",
        "http://www.lexinfo.net/ontology/2.0/lexinfo#pronoun": "PRON"
    },
    "defaultId": "dict_id",
    "defaultRelease": "PUBLIC"
}

The configuration has the following values

posProperty: The URI of the RDF property used to indicate part-of-speech
posMapping: A mapping of values, either RDF URI or the content of TEI tags that is mapped to a given UD value (ADJ, ADP, ADV, AUX, CCONJ, DET, INTJ, NOUN, NUM, PART, PRON, PROPN, PUNCT, SCONJ, SYM, VERB, X)
defaultId: The default ID for a dictionary (instead of a --id flag)
defaultRelease: The default release level of the dictionary (PUBLIC, NONCOMMERCIAL, RESEARCH, PRIVATE)