ExposuresProvider / cam-pipeline

Data loading pipeline for CAM database
https://exposuresprovider.github.io/cam-pipeline/
MIT License
2 stars 4 forks source link

Produce KGX files for import into ORION for eventual hosting on Plater #88

Closed gaurav closed 8 months ago

gaurav commented 1 year ago

We are starting an experiment to get rid of the CAM-KP-API frontend by moving the complex translation from relation-space to Biolink/TRAPI-space into the cam-pipeline backend.

Our current plan is to create a new, clean Snakemake file (or other Makefile-like tool) that will generate the node and edge information in a format that ORION can parse, probably tab-delimited files. Here is an ORION importer that might be similar to what we need to import: https://github.com/RobokopU24/ORION/blob/e674da629259deb69a199637cbdccdcf79dbe5b9/parsers/textminingkp/src/loadTMKP.py

Some notes:

  1. The input doesn't need Biolink types: all identifiers are fed into the Node Normalizer, which provides the Biolink types for them anyway. (We can also opt to turn NodeNorm off, but we'd either need to prepare the MetaKG appropriately.)
  2. We only need to specify the most specific Biolink predicate for each edge; Plater's transplicer (?) should be able to match these correctly.
  3. ORION doesn't currently have any particular support for or validation of qualified predicates, but as long the primary predicate is a Biolink predicate, the list of qualifiers should be passed through unchanged.
  4. ORION validates the input against the Biolink model, which is a good thing.

Instead of going through ORION, we could produce data that can be directly imported by Plater, but that would require:

  1. A Neo4J dump of the data
  2. A MetaKG description of the data
  3. SRI testing data for this data
  4. A Metadata JSON file that is used to populate the import data.
gaurav commented 8 months ago

This was implemented in PR https://github.com/ExposuresProvider/cam-pipeline/pull/97 and PR https://github.com/ExposuresProvider/cam-pipeline/pull/100. Closing.