MaastrichtU-IDS / d2s-scripts-repository

📜 Transformation scripts to build Data2Services knowledge graphs (SPARQL insert queries, RML mappings, Shell commands).
2 stars 0 forks source link

Get started

This repository stores transformation scripts used in the data2services-pipeline to build Data2Services knowledge graphs (SPARQL insert queries, RML mapping files).

SPARQL queries are usually executed by repositories using the data2services-sparql-operations docker image.

Clone

Optional, see below: SPARQL queries can be run using data2services-sparql-operations by directly providing a GitHub repository URL.

git clone --recursive https://github.com/MaastrichtU-IDS/d2s-transform-repository
cd d2s-transform-repository

Pull

Uses data2services-sparql-operations to execute multiple query (all .rq file in a folder or in a GitHub repository).

docker pull vemonet/data2services-sparql-operations

Run

Execute all SPARQL Insert queries at a GitHub repository URL. See data2services-sparql-operations GitHub for more documentation.

# Command to load UniProt organisms and Human proteins as BioLink
docker run -it --link graphdb:graphdb vemonet/data2services-sparql-operations -f "https://github.com/MaastrichtU-IDS/d2s-transform-repository/tree/master/sparql/insert-biolink/uniprot" -ep "http://graphdb:7200/repositories/test/statements" -un MYUSERNAME -pw MYPASSWORD --var-output https://w3id.org/data2services/graph/biolink/uniprot

Other examples

Transform datasets from generic RDF generated by the Data2Services pipeline to the BioLink model (e.g. Drugbank or HGNC).

# DrugBank conversion from xml2rdf generic RDF
docker run -it --link graphdb:graphdb \
  vemonet/data2services-sparql-operations \
  -f "https://github.com/MaastrichtU-IDS/d2s-transform-repository/tree/master/sparql/insert-biolink/drugbank" \
  -ep "http://graphdb:7200/repositories/test/statements" \
  -un MYUSERNAME -pw MYPASSWORD \
  --var-service http://localhost:7200/repositories/test --var-input https://w3id.org/data2services/graph/xml2rdf --var-output https://w3id.org/data2services/biolink/drugbank

# HGNC conversion from AutoR2RML generic RDF
docker run -it --link graphdb:graphdb \
  vemonet/data2services-sparql-operations \
  -f "https://github.com/MaastrichtU-IDS/d2s-transform-repository/tree/master/sparql/insert-biolink/hgnc" \
  -ep "http://graphdb:7200/repositories/test/statements" \
  -un MYUSERNAME -pw MYPASSWORD \
  --var-service http://localhost:7200/repositories/test --var-input https://w3id.org/data2services/graph/autor2rml --var-output https://w3id.org/data2services/biolink/hgnc

Use SPARQL for conversion

Generating 2 types of generic SPARQL:

How we do

https://biolink.github.io/biolink-model/

https://raw.githubusercontent.com/biolink/biolink-model/master/ontology/biolink.ttl

https://bioportal.bioontology.org/ontologies/BLM

Limitations

Enhancement

Also generating a generic construct query when generating the generic RDF (with AutoR2RML and xml2rdf)

Then we "just" have to match it with the right bioentity class

The programs that do the generic RDF transformation should generate a file formally describing the data structure that is then used to generate the SPARQL construct query

Using RML for conversion

RMLStreamer

Scala implementation, require to stream files to Flink using Kafka.

Documentation about running on Docker will be released soon.

RMLMapper

Java implementation, not used because of scalability issues.

# Build
mvn clean package -DskipTests
# Run
java -jar /data/rmlmapper-java/target/rmlmapper-4.1.0-r55-jar-with-dependencies.jar -c /data/drugbank/drugbank_config.properties

RocketRML

NodeJS implementation focusing on XML and JSON conversion.