MaastrichtU-IDS / d2s-sparql-operations

✨️ Execute SPARQL queries from string, URL or multiple files using the RDF4J framework.
https://maastrichtu-ids.github.io/d2s-sparql-operations/
MIT License
1 stars 1 forks source link

A Java CLI to upload RDF files, and execute SPARQL queries from string, URL or multiple files using RDF4J.

See the Data2Services framework documentation to run d2s-sparql-operations as part of workflows to generate RDF knowledge graph from structured data.


Use the jar

Java version Download jar

Download the .jar file from the latest GitHub release here, you can use this command to do it automatically in a Bash terminal:

wget https://github.com/MaastrichtU-IDS/d2s-sparql-operations/releases/latest/download/sparql-operations.jar

Move the jar somewhere you can call it easily, e.g. in a bin folder in your home folder:

mkdir -p ~/bin && mv sparql-operations.jar ~/bin/sparql-operations.jar
java -jar ~/bin/sparql-operations.jar -o upload -i "*.ttl" -e "https://graphdb.dumontierlab.com/repositories/test/statements" -u 'username' -p 'password' -g "http://my-graph.com"

Optionally use -g "http://my-graph.com" to specify a graph to upload the data to

You can also define the username and password using environment variables:

export D2S_USERNAME=myusername
export D2S_PASSWORD=mypassword
java -jar ~/bin/sparql-operations.jar -o select -q "SELECT * WHERE {?s ?p ?o .} LIMIT 10" -e "https://graphdb.dumontierlab.com/repositories/test"

See below for more example to execute SPARQL queries, and various operations.

Build the jar

Compile the jar file from the source code:

mvn clean package

Move it:

mv target/sparql-operations-*-jar-with-dependencies.jar ~/bin/sparql-operations.jar

Use Docker

Pull

Available on DockerHub the latest image is automatically built from latest branch master commit on GitHub.

docker pull umids/d2s-sparql-operations

Build

You can also clone the GitHub repository and build the docker image locally (unecessary if you do docker pull)

git clone https://github.com/MaastrichtU-IDS/d2s-sparql-operations
cd d2s-sparql-operations
docker build -t umids/d2s-sparql-operations .

Run

N.B.: you will need to remove the \ and make the docker run commands one-line for Windows PowerShell.

Usage

docker run -it --rm umids/d2s-sparql-operations -h

Upload

Upload RDF files to a SPARQL endpoint:

docker run -it --rm -v $(pwd):/data umids/d2s-sparql-operations -o upload \
  -i "*.ttl" \
  -e "https://graphdb.dumontierlab.com/repositories/test/statements" \
  -u $USERNAME -p $PASSWORD \
  -g "http://my-graph.com"

Select

On DBpedia using a SPARQL query string as argument.

docker run -it --rm umids/d2s-sparql-operations -o select \
  -q "select distinct ?Concept where {[] a ?Concept} LIMIT 10" \
  -e "http://dbpedia.org/sparql"

Update

Multiple INSERT on graphdb.dumontierlab.com, using files in a repository from the local file system.

docker run -it --rm umids/d2s-sparql-operations \
  -e "https://graphdb.dumontierlab.com" -r "test" \
  #-e "https://graphdb.dumontierlab.com/repositories/test/statements" \
  -o update -u $USERNAME -p $PASSWORD \
  -i "https://github.com/MaastrichtU-IDS/d2s-sparql-operations/tree/master/src/main/resources/insert-examples"

Construct

On graphdb.dumontierlab.com using GitHub URL to get the SPARQL query from a file.

docker run -it --rm umids/d2s-sparql-operations -o construct \
  -e "https://graphdb.dumontierlab.com/repositories/ncats-red-kg" \
  -i "https://raw.githubusercontent.com/MaastrichtU-IDS/d2s-sparql-operations/master/src/main/resources/example-construct-pathways.rq" 

GitHub repository

We crawl the example GitHub repository and execute each .rq file.

docker run -it --rm umids/d2s-sparql-operations \
  -o select -e "http://dbpedia.org/sparql" \
  -i "https://github.com/MaastrichtU-IDS/d2s-sparql-operations/tree/master/src/main/resources/select-examples" 

Crawling GitHub repository from URL is based on HTML parsing, hence might be unstable


YAML

A YAML file can be used to provide multiple ordered queries. See example from GitHub.

docker run -it --rm umids/d2s-sparql-operations \
  -o select -e "http://dbpedia.org/sparql" \
  -i "https://raw.githubusercontent.com/MaastrichtU-IDS/d2s-sparql-operations/master/src/main/resources/example-queries.yaml"

Split

Beta To split an object into multiple statements using a delimiter, and insert the statements generated by the split in the same graph.

E.g.: a statement with value "1234,345,768" would be splitted in 3 statements "1234", "345" and "768".

docker run -it \
  umids/d2s-sparql-operations -op split \
  --split-property "http://w3id.org/biolink/vocab/has_participant" \
  --split-class "http://w3id.org/biolink/vocab/GeneGrouping" \
  --split-delimiter "," \
  --split-delete \ # Delete the splitted statement
  --uri-expansion "https://w3id.org/d2s/" \ # Use 'infer' to do it automatically using prefixcommons
  #--trim-delimiter '"' \
  -e "https://graphdb.dumontierlab.com" \ # RDF4J server URL
  -rep "test" \ # RDF4J server repository ID
  -u USERNAME -pw PASSWORD

# For SPARQLRepository
#  -e "https://graphdb.dumontierlab.com/repositories/test" \
#  -uep "https://graphdb.dumontierlab.com/repositories/test/statements" \

Set variables

3 variables can be set in the SPARQL queries using a ?_: ?_input, ?_output and ?_service. See example:

INSERT {
  GRAPH <?_output> {
    ?Concept a <https://w3id.org/d2s/Concept> .
  }
} WHERE {
  SERVICE <?_service> {
    GRAPH <?_input> {
      SELECT * {
        [] a ?Concept .
      } LIMIT 10 
} } }

Execute:

docker run -it --rm umids/d2s-sparql-operations \
  -op update -e "https://graphdb.dumontierlab.com/repositories/test/statements" \
  -u $USERNAME -pw $PASSWORD \
  -i "https://raw.githubusercontent.com/MaastrichtU-IDS/d2s-sparql-operations/master/src/main/resources/example-insert-variables.rq" \
  --var-input http://www.ontotext.com/explicit \
  --var-output https://w3id.org/d2s/output \
  --var-service http://localhost:7200/repositories/test

Examples

From data2services-transform-repository, use a federated query to transform generic RDF generated by AutoR2RML and xml2rdf to the BioLink model, and load it to a different repository.

# DrugBank
docker run -it --rm -v "$PWD/sparql/insert-biolink/drugbank":/data \
  umids/d2s-sparql-operations \
  -i "/data" -u USERNAME -pw PASSWORD \
  -e "https://graphdb.dumontierlab.com/repositories/ncats-test/statements" \
  --var-service http://localhost:7200/repositories/test \ 
  --var-input http://data2services/graph/xml2rdf \ 
  --var-output https://w3id.org/d2s/graph/biolink/drugbank

# HGNC
docker run -it --rm -v "$PWD/sparql/insert-biolink/hgnc":/data \
  umids/d2s-sparql-operations \
  -i "/data" -u USERNAME -pw PASSWORD \
  -e "https://graphdb.dumontierlab.com/repositories/ncats-test/statements" \
  --var-service http://localhost:7200/repositories/test \
  --var-input http://data2services/graph/autor2rml \
  --var-output https://w3id.org/d2s/graph/biolink/hgnc