Swirrl / csv2rdf

Clojure library and command line application for converting CSV to RDF. An implementation of the W3C CSVW specifications
Eclipse Public License 1.0
26 stars 6 forks source link
clojure csv csvw linked-data rdf

csv2rdf

CircleCI

Command line application (and clojure library) for converting CSV to RDF according to the specifications for CSV on the web.

Native Builds

We provide CI generated native builds for Linux (AMD64) and MacOS (AMD64) of the csv2rdf command line app attached to releases.

Running

csv2rdf can be run from the command line given the location of either a tabular data file or metadata file referencing the described tabular file. The location can be either a path on the local machine or URI for the document on the web.

To run from a tabular file:

java -jar csv2rdf-standalone.jar -t /path/to/tabular/file.csv

The resulting RDF is written to standard output in turtle format. The output can instead be written to file with the -o option:

java -jar csv2rdf-standalone.jar -t /path/to/tabular/file.csv -o output.ttl

The extension of the output file is used to determine the output format. The full list of supported formats is defined by rdf4j, some common formats are listed below:

Extension Format
.ttl turtle
.nt n-triples
.xml rdf-xml
.trig trig
.nq n-quads

Note that for quad formats like trig and n-quads the graph will be nil.

The triples are generated according to CSVW standard mode by default. The mode to use can be specified by the -m parameter:

java -jar csv2rdf-standalone.jar -t /path/to/tabular/file.csv -m minimal

The supported values for the mode are standard and minimal and annotated. annotated mode is a non-standard mode which behaves like minimal mode with the addition that any notes or non-standard annotations defined for table groups and tables will be output if the corresponding metadata element specifies an @id.

The recommended way to start processing a tabular file is from a metadata document that describes the structure of a referenced tabular file. The tabular file does not need to be provided when processing from a metadata file since the metadata should contain a reference to the tabular file(s).

java -jar csv2rdf-standalone.jar -u /path/to/metadata/file.json -o output.ttl

Running with docker

Docker images are published to the public repository europe-west2-docker.pkg.dev/swirrl-devops-infrastructure-1/public/csv2rdf. These can be run by specifying the image version to run, and mapping volumes into the container to make local files available within the container e.g.

docker run --rm -v .:/data europe-west2-docker.pkg.dev/swirrl-devops-infrastructure-1/public/csv2rdf:v0.7.1 -t /data/input.csv -o /data/output.ttl

Note that file paths should be specified relative to the container, not the local system.

Using as a library

csv2rdf also exposes its functionality as a library - please see the csv2rdf library for a description of the library and its interface.

Deploying new builds

In order to compile and deploy new native image builds for all our supported architectures, just create a release in the Github UI tagged to a commit.

License

Copyright © 2018 Swirrl IT Ltd.

Distributed under the Eclipse Public License either version 1.0 or (at your option) any later version.