SemanticComputing / fuseki-docker

Apache Jena Fuseki with SeCo extensions
MIT License
33 stars 15 forks source link

Jena Fuseki Docker image

Docker image for Apache Jena Fuseki with SeCo extensions, inspired by stain/jena-fuseki.

Available on Docker Hub: secoresearch/fuseki.

Default configuration

The Fuseki administrative interface is accessible at http://localhost:3030 with the admin password defined as docker run parameter (see the Run section below).

The container has a preconfigured service/dataset ds that uses TDB for data storage and has a Lucene text index (see assembler.ttl for configuration).

The dataset has the tdb:unionDefaultGraph set to true, meaning that the query patterns on the default graph match against the union of the named graphs. Note: the stored default graph is not part of this union of the named graphs. Thus, if you add data into the default graph, you will have to access the graph data by using the special name <urn:x-arq:DefaultGraph> in a GRAPH pattern.

The query timeout is configured to 60 seconds by default (see the Run section below for instructions on configuration).

Data access

The data can be accessed via the default endpoints:

Other endpoints can be enabled, as well (see the Run section for instructions):

The container includes Jena tdbloader, textindexer, and tdbstats scripts for loading RDF data into TDB model. See the Dockerfile of the congress-legislators dataset for an example.

Persistent storage

There are two options for persistent data storage (by default the data is lost when the container is removed):

  1. Use volume/bind mount for the directory /fuseki-base/databases. Useful if you need to update the data in runtime, or you don't want to include the data inside your Docker image (e.g. to keep the image size small; see the Run section below for an example).
  2. Create your own Dockerfile, using secoresearch/fuseki as a parent image, and load the data in the build phase (i.e. the data is included in the Docker image). Useful for static, small(ish), read-only datasets, and for distributing your dataset + Fuseki as a self-contained container image. See the Dockerfile of the congress-legislators dataset for an example.

Note: if you wish to add new datasets using the Fuseki admin UI - and persist them - you also need to use volume/bind mount for the directory /fuseki-base/configuration. In that case, when using bind mount/pre-populated volume you will have to copy the assembler.ttl on the volume/bind mount so that it is visible for the container as /fuseki-base/configuration/assembler.ttl (when using non-prepopulated volume, the file /fuseki-base/configuration/assembler.ttl in the image gets copied to the volume so no need to copy the file manually).

Note 2: if you have the assembler.ttl file on a bind mount, using environment variables to enable Fuseki's endpoints (for writing data; see the Run section for details), e.g. ENABLE_DATA_WRITE, is not encouraged. Using both bind mount for the assembler.ttl file and e.g. ENABLE_DATA_WRITE=true causes a problem related to file permissions (sed: couldn’t open temporary file /fuseki-base/configuration/sedXXX: Permission denied) (which can be solved by either running the container as the root user or giving write permissions to the host directory that is mounted to the container; at least on Linux). The recommended way when having the assembler.ttl file on a bind mount is to edit the assembler.ttl file by hand to enable endpoints.

Note on running in OpenShift, if you use this image as a parent image (e.g. use your own Dockerfile to load the data inside the image using TDBLOADER): as containers are run as an arbitrary user, you'll have to ensure the write permission on the TDB and index directories, e.g. by adding the following lines in your Dockerfile after the tdbloader and indexing commands:

# Set permissions to allow fuseki to run as an arbitrary user
RUN chgrp -R 0 $FUSEKI_BASE \
    && chmod -R g+rwX $FUSEKI_BASE

Build

docker build --squash -t secoresearch/fuseki .

Run

docker run --rm -it -p 3030:3030 --name fuseki -e ADMIN_PASSWORD=[PASSWORD] -e ENABLE_DATA_WRITE=[true|false] -e ENABLE_UPDATE=[true|false] -e ENABLE_SHACL=[true|false] -e QUERY_TIMEOUT=[number in milliseconds] -e USE_SYSTEM_CA_CERTS=[|1] --mount type=bind,source="$(pwd)"/fuseki-data,target=/fuseki-base/databases secoresearch/fuseki

Or to support adding new datasets using the Fuseki admin UI:

mkdir fuseki-data
mkdir fuseki-configuration
cp -p assembler.ttl fuseki-configuration/
# edit fuseki-configuration/assembler.ttl to enable the endpoints you wish
docker run --rm -it -p 3030:3030 --name fuseki -e ADMIN_PASSWORD=[PASSWORD] -e QUERY_TIMEOUT=[number in milliseconds] --mount type=bind,source="$(pwd)"/fuseki-data,target=/fuseki-base/databases --mount type=bind,source="$(pwd)"/fuseki-configuration,target=/fuseki-base/configuration secoresearch/fuseki

The same run command can be used to pull and run the container from Docker Hub (no need to build the image first).