The suite is divided in two separate, extensible parts:
hydrators enable users to import and populate data into a graph database. The reason not to call them importers is import
is a reserved keyword in Python and from importers import importer
is a bit confusing. :dizzy_face:
actions provide different tools to work with the generated graph. The first and most important is to run a series of tests to validate the constraints Data Wranglers want to impose on submissions. Another action is generating reports and extracting statistics from the graph to send to the submitters. Any other actions can be implemented to extend the suite.
So far, the functionality planned is as follows (WIP items are still not fully implemented):
Hydrators:
Actions:
The Graph Validator Suite requires docker running in the host machine.
git clone git@github.com:ebi-ait/ingest-graph-validator.git
cd ingest-graph-validator
python -mvenv .venv
source .venv/bin/activate
pip install -e .
Ensure Docker is installed and running
Create a file for the GCP credentials in your HOME folder: ~/.secrets/gcp_credentials
and copy the secrets from AWS Secrets Manager under the following key: ingest/{ENV}/gcp-credentials.json
.
Where ENV could be one of the following values:
dev
for the dev environmentstaging
for the staging environmentprod
for the production environmentrun neo4j locally (in another terminal session or with -d
(detached) flag)
docker run -p7687:7687 -p7474:7474 --env NEO4J_AUTH=neo4j/password --env=NEO4J_ACCEPT_LICENSE_AGREEMENT=yes neo4j:3.5.14-enterprise
export INGEST_GRAPH_VALIDATOR_INGEST_API_URL=https://api.ingest.archive.data.humancellatlas.org/
http://localhost:8080
)Initialize the database backend and enables a frontend visualizer to query the database,
in http://localhost:7474 by default by executing this in the command line:
ingest-graph-validator init
Import a spreadsheet:
ingest-graph-validator hydrate ingest <sub_uuid>
(via ingest)ingest-graph-validator hydrate xls <spreadsheet filename>
(via a spreadsheet)Go to http://localhost:7474 in a browser to open the frontend.
You can then start writing cypher queries in the input field on top of the web frontend to visualize the graph. For example:
MATCH p=(n) RETURN p
Will show the entire graph. Keep in mind this will crash the browser on huge datasets.
Run tests
ingest-graph-validator action test <path_to_tests>
ingest-graph-validator action test graph_test_set
It is possible to run the graph validator so that it listens to a queue on RabbitMQ that receives submission UUIDs. Once a message is received from the queue the hydrate and action commands are ran for the given submission UUID. This is how the graph validator is ran in the Ingest k8s infrastructure
ingest-graph-validator action ingest-validator graph_test_set
The above command runs the listener for the graph_test_set
export INGEST_GRAPH_VALIDATOR_INGEST_API_URL=http://localhost:8080
docker run -p7687:7687 -p7474:7474 --env NEO4J_AUTH=neo4j/password --env=NEO4J_ACCEPT_LICENSE_AGREEMENT=yes neo4j:3.5.14-enterprise
ingest-graph-validator action ingest-validator graph_test_set
curl -X PUT http://localhost:8080/submissionEnvelopes/<submission_id>/graphValidationRequestedEvent
sequenceDiagram
participant UI
participant c as "ingest-core"
participant gv as "ingest-graph-validator"
participant q as RabbitMQ
participant st as "ingest-state-tracking"
gv->>q: Listen to queue
UI->>c: PUT /submissionEnvelopes/{id}/graphValidationRequestedEvent
c->>st: Request change of state to GRAPH_VALIDATION_REQUESTED
activate st
st-->>c: Commit change of state to GRAPH_VALIDATION_REQUESTED
c->>q: Add graph validation message to queue
q->>gv: Pick up message from queue
activate gv
gv->>c: PUT /submissionEnvelopes/{id}requestGraphValidating
c->>st: Request change of state to GRAPH_VALIDATING
activate st
st-->>c: Commit change of state to GRAPH_VALIDATING
note left of gv: Begin graph validation
gv->>c: PATCH /{entity_type}/{id} update graphValidationErrors on each entity
gv->>c: PUT /submissionEnvelopes/{id}/requestGraphValid or /submissionEnvelopes/{id}/requestGraphInvalid
c->>st: Request change of state to GRAPH_VALID or GRAPH_INVALID
activate st
st-->>c: Commit change of state to GRAPH_VALID or GRAPH_INVALID
The Graph Validator Suite uses a CLI similar to git. Running a command without specifying anything else will show help for that command. At each level, the commands have different arguments and options. Running any subcommand with -h
or --help
with give you more information about it.
The root level commands are:
ingest-graph-validator init
starts the database backend and enables a frontend visualizer to query the database, in http://localhost:7474
by default.
ingest-graph-validator hydrate
shows the list of available hydrators.
ingest-graph-validator actions
shows the list of available actions.
ingest-graph-validator shutdown
stops the backend.
MATCH p = (n)
RETURN p
MATCH p = (n)
WHERE NOT n:LABEL AND NOT n:LABEL
RETURN p
This one will be shown with an example. The example selects the donor CBTM-376C from Meyer's Tissue Stability dataset, and expands the paths to show all biomaterials, processes and files linked to it.
Note: Make sure to strictly define only one node to use as the source, otherwise it will be confusing.
Note: You have to be careful not to include nodes that would link your path to another one. For example, protocol
or project
are linked to more than one experimental design.
The first two lines are used to select one single node from which to expand. The third line expands the path using these parameters:
n
, the starting node or nodes (preferably one for your first queries).""
, the relationship filter. We are not filtering by any relations in this query."-project|-protocol"
, the label filter. We are excluding (hence the minus sign) any nodes with the labels project
or (that is represented by the |
) protocol
.0
is the minimum depth. Normally 0. Otherwise the starting nodes get excluded.-1
is used to determine the maximum depth for the path expansion. -1 means no limit. If you would set a 1 here, the result would be the CBTM-376C
donor and its first level neighbours.MATCH (n:donor_organism)
WHERE n.`biomaterial_core.biomaterial_id` = "CBTM-376C"
CALL apoc.path.expand(n, "", "-project|-protocol", 0, -1) YIELD path
RETURN path
This package was created with Cookiecutter.