TopQuadrant / shacl

SHACL API in Java based on Apache Jena
Apache License 2.0
217 stars 61 forks source link

Lightweight Docker image #148

Closed supermaxiste closed 1 year ago

supermaxiste commented 1 year ago

Dear Holger,

your SHACL API tool is very nicely implemented and this PR includes a Dockerfile to make the tool even more accessible. The Dockerfile includes a multi-stage build to ensure that only a minimal Java environment is created to support Jena for the SHACL API. To find out about the minimal requirements, the Apache Jena developers provided all the information and tools needed for this task.

The current Dockerfile creates an image of 144Mb, making it relatively lightweight. This Dockerfile is linked to an entrypoint.sh script to call either shaclvalidate.sh or shaclinfer.sh with the commands validate and infer respectively. I modified the README to explain the usage as follows:

Dockerfile Usage

The Dockerfile in the .docker folder includes a minimal Java Runtime Environment for the SHACL API that clocks in at 144Mb. To build the docker image use:

docker build -t topquadrant/shacl:1.4.2 .docker/

To use the Docker image, there are two possible commands. To run the validator:

docker run --rm -v /path/to/data:/data topquadrant/shacl:1.4.2 validate -datafile /data/myfile.ttl -shapesfile /data/myshapes.ttl

To run rule inferencing:

docker run --rm -v /path/to/data:/data topquadrant/shacl:1.4.2 infer -datafile /data/myfile.ttl -shapesfile /data/myshapes.ttl

Any other command after topquadrant/shacl:1.4.2 will print the following help page:

Please use this docker image as follows:
docker run -v /path/to/data:/data shacl_API [COMMAND] [PARAMETERS]
COMMAND:
    validate 
        to run validation
    infer
        to run rule inferencing
PARAMETERS:
    -datafile /data/myfile.ttl [MANDATORY]
        input to be validated (only .ttl format supported)
    -shapesfile /data/myshapes.ttl [OPTIONAL]
        shapes for validation (only .ttl format supported)

The current PR would already be a fully working Docker solution for shacl-1.4.2, but there could be another major improvement to this. If the container is made available on DockerHub by TopQuadrant, it would be possible to write a GitHub Action to automatically build and push an image for every new release of shacl. My colleagues and I from the Swiss Data Science Center - Open Research Data would be glad to further contribute as we'll be using shacl moving forward.

HolgerKnublauch commented 1 year ago

Hi and thanks for your contribution. I honestly never had an opportunity to work with docker so far and cannot really comment on it, but I trust you have tried it out and are happy with it. Assuming you will be able to field questions and maintain this feature in the future, I'd be happy to accept your PR into master, OK?

supermaxiste commented 1 year ago

Hi Holger, I confirm that the Dockerfile has been tested and we're happy with it. For any questions or issues, my colleagues and I will make sure to address them. In principle it should be mostly usage questions and to ensure that the Dockerfile is updated with the latest version of this tool, we'll open a separate PR to automate the process.

HolgerKnublauch commented 1 year ago

Ok thanks. Let's go.