ExposuresProvider / cam-pipeline

Data loading pipeline for CAM database
https://exposuresprovider.github.io/cam-pipeline/
MIT License
2 stars 4 forks source link

Write tests for cam-pipeline #94

Closed gaurav closed 5 months ago

gaurav commented 1 year ago

Probably the simplest way of doing this would be to add tests to Souffle so that invalid or suspicious edges are reported somewhere (e.g. any edge that isn't mapped to a Biolink predicate).

We should also set up a test kit in this repository that can query the TRAPI endpoint and confirm that it is working correctly. We have code for doing that in https://github.com/ExposuresProvider/cam-kp-api/, so we could try moving that over in Scala/ZIO or rewriting it into Souffle/Python/something simple.

gaurav commented 9 months ago

Moving over the existing code is going to be a pain, because it has deep dependencies with ZIO testing, Circe and the code in cam-kp-api that models the TRAPI messages.

Instead, I think the right approach needs to be:

  1. Use https://automat.renci.org/cam-kp/1.4/sri_testing_data to download example data.
  2. Build a Scala-CLI ZIO module for querying Automat-CAM-KP with a simple query (see below for some examples) via the /query endpoint.
  3. Create test files for the ZIO module which uses it to query an Automat-CAM-KP instance with some example data and compare the output to the expected output.
  4. Initially this will focus on our primary customer, ICEES (see https://github.com/ExposuresProvider/cam-pipeline/issues/101 and https://github.com/ExposuresProvider/cam-kp-api/issues/599), but eventually we expect to enhance these tests as cam-pipeline itself improves.

An example of a simple query that works against Automat CAM-KP (i.e. by running curl -X POST "https://automat.renci.org/cam-kp/1.4/query" -H "accept: application/json" -H "Content-Type: application/json" -d "{\"message\":{\"query_graph\":{\"nodes\":{\"n0\":{\"categories\":[\"biolink:Gene\"],\"ids\":[\"NCBIGene:13870\"]},\"n1\":{\"categories\":[\"biolink:BiologicalProcess\"]}},\"edges\":{\"e01\":{\"subject\":\"n0\",\"object\":\"n1\",\"predicates\":[\"biolink:causes\"]}}}}}"):

{
    "message": {
        "query_graph": {
            "nodes": {
                "n0": {
                    "categories": ["biolink:Gene"],
                    "ids": ["NCBIGene:13870"]
                },
                "n1": {
                    "categories": ["biolink:BiologicalProcess"]
                }
            },
            "edges": {
                "e01": {
                    "subject": "n0",
                    "object": "n1",
                    "predicates": ["biolink:causes"]
                }
            }
        }
    }
}