The latest and official version of the Earth Metabolome Initiative (EMI) ontology is available in emi.ttl
that can replace the enpkg vocabulary, for example.
Any ontology issue, change or suggestion should be reported based on the emi.ttl
file. The ontology documentation, other ontology files in the docs folder and ontop_config, and the emi_no_import.ttl file are generated based on the emi.ttl
file. The emi_no_import.ttl is the same as emi.ttl
without imported ontologies.
To open and edit the ontology, it can be done with a text editor or an ontology editor such as Protege.
For more details, see the EMI ontology documentation. The ontology documentation is fully generated with the WIDOCO tool. The WIDOCO-generated files are in the docs folder.
The npc_taxonomy.ttl
file is an SKOS-based OWL ontology for the structural classification of natural products derived from the NPClassifier tool. This OWL ontology was generated with the script in scripts
.
For more details, see Natural Product Classifier vocabulary.
A knowledge graph was generated based on the EMI ontology with the pf1600 dataset and structure metadata dataset sqlite. It contains more than 32 million triples and is accessible and downloadable via the SPARQL endpoint: https://biosoda.unil.ch/graphdb/sparql.
Summary
In this tutorial, we will use a toy dataset and it requires mainly MySQL (version 8) and Ontop (version 5.1 or later).
Download the toy dataset from ENPKG full.
Download and install
MySQL 8.2.
To check, if MySQL was correctly installed
mysql --version
Install the Pipfile:
cd ./scripts/sql_insert_emi_data
pipenv install
If you do not have pipev, install it as shown below (see more instructions).
pip install pipenv --user
In case you have any issue connecting check https://gist.github.com/zubaer-ahammed/c81c9a0e37adc1cb9a6cdc61c4190f52?permalink_comment_id=4473133
From the root of this directory, create a database emi_db
with the sql statements from raw_mysql_schema.sql into the MySQL server
mysql -u root -p < ./scripts/sql_insert_emi_data/raw_mysql_schema.sql
NOTE: Optionally, if an
emi_db
already exists in your MySQL server and if you want to start from scratch (i.e., an empty database), you should drop it before running theraw_mysql_schema.sql
script with the command above. Note that the data will be added in the database allowing duplicates. The command below will dropemi_db
.mysql -u root -p --execute="DROP DATABASE IF EXISTS emi_db ;"
You can connect to the database as shown below
mysql -u root -p
Check if the schema was created
show databases;
use emi_db;
show tables;
Alternatively, you can use the MYSQL Workbench to work with the emi_db database
mysql-workbench
NOTE: We observe that the structure_metadata (sqlite) is missing. Alternatively, you can consider to download an example from https://zenodo.org/records/12534675.
mysql -u root -p
SHOW VARIABLES LIKE "local_infile";
SET GLOBAL local_infile = 1;
SHOW VARIABLES LIKE "local_infile";
Loading local data is now enabled. To check it, you can run:
mysql> SHOW VARIABLES LIKE "local_infile";
+---------------+-------+
| Variable_name | Value |
+---------------+-------+
| local_infile | ON |
+---------------+-------+
1 row in set (0,01 sec)
Edit the scripts/sql_insert_emi_data/config.py file and make sure that the path are pointing to the correct files.
NOTE: To generate also a SKOS-based version of the Open Tree of Life download the tsv files from https://tree.opentreeoflife.org/about/taxonomy-version and include in the config.py the directory path to these files by replacing the
None
value with this path.
Run the command below to intiate the insertion in the emi_db database.
pipenv run python ./scripts/sql_insert_emi_data/main.py
NOTE: Alternatively, you can run
python ./scripts/sql_insert_emi_data/main.py
, if you have all dependencies listed in Pipfile installed in your python enviroment.
IMPORTANT: This tutorial was only tested with the Python 3.9 version, but it might work in any other 3.x version.
Download and unzip the Ontop CLI tool from https://sourceforge.net/projects/ontop4obda/files/ontop-5.1.1/ontop-cli-5.1.1.zip/download
Get MySQL JDBC driver
We recommend to download the version mysql-connector-j-8.2.0.jar from the MySQL download archive at
https://downloads.mysql.com/archives/c-j/
Move the mysql-connector-j-8.2.0.jar to the ontop-cli-5.1.1/lib
folder
Create ontop properties text file ./ontop_config/emi-v0_2/emi-v0_2.properties
such as the example below (change the user, password, and, if necessary, the url parameter too)
jdbc.password=root
jdbc.user=root
jdbc.name=5e86f1b2-b7d8-4a17-9bc6-32b98b12ed79
jdbc.url=jdbc\:mysql\://localhost\:3306/emi_db
jdbc.driver=com.mysql.cj.jdbc.Driver
ontop.inferDefaultDatatype=True
PATH/TO/ontop-cli-5.1.1/ontop materialize -m ./ontop_config//emi-v0_2/emi-v0_2.obda -t ./ontop_config/emi-v0_2/emi-v0_2.ttl -p ./ontop_config/emi-v0_2/emi-v0_2.properties -f turtle --enable-annotations --separate-files -o ./data/ontop
NOTE: you can allocated more memory to run ontop by editing the PATH/TO/ontop-cli-5.1.1/ontop file. For intance,
ONTOP_JAVA_ARGS="-Xmx16g"
instead ofONTOP_JAVA_ARGS="-Xmx1g"
NOTE: If necessary you may need to specify the classpath for the mysql-connector-java .jarexport CLASSPATH=$CLASSPATH:/Applications/ontop-cli-5.1.1/lib/mysql-connector-java-8.2.0.jar
Importing the generated RDF-based files in a triple store
For GraphDB 10.6, see Loading data using importrdf.
For Stardog, see Adding data documentation section.
For Virtuoso, see Loading RDF data.
Ontop allow us to build vitual knowledge graphs. With its plugin for Protege, we can query the VKG for more information see the section Setting up the VKG using Ontop-Protégé.
NOTE: We recommend to download and use the Ontop+Protege 5.1.1. To build the VKG, you will also need all configuration files used to materialize the VKG in subsection Generating the EMI-based RDF graph, notably
./ontop_config/emi-v0_2/emi-v0_2.obda
,./ontop_config/emi-v0_2/emi-v0_2.ttl
and./ontop_config/emi-v0_2/emi-v0_2.properties
.
A full tutorial about Ontop-Protégé is available at (https://doi.org/10.1016/j.patter.2021.100346).