Closed cmungall closed 2 years ago
We will also want to eliminate pseudo-blank nodes
Discussed on call with @kshefchek @mbrush @putmantime @deepakunni3 @yy20716:
Scenario: a reasoner team wants to bring in monarch d2p edges into their graph, together with properties about the diseases.
They run the kgx export command line tool (possibly via docker) and specific subjcat=disease, objcat=phenotype, and desired output format (graphml, csv, rdf, neo4jdump format). They can then load that directly - or possibly massage the output somehow
I note the monarch-lite made before the last hackathon has equiv edges
we want to avoid this here (the same info can be put in as node properties)
I've made some progress cleaning up the scigraph.ncats.io graph
For example to get gene to disease:
(:gene)-[edge:`http://purl.obolibrary.org/obo/RO_0002326`]->(:disease)
gene to phenotype
(:gene)-[edge:`http://purl.obolibrary.org/obo/RO_0002200`]->(:phenotype)
The inferences for human G2P are more liberal than what we index in solr for the monarch. Theres also mouse and zebrafish data.
disease to phenotype:
(:disease)-[edge:`http://purl.obolibrary.org/obo/RO_0002200`]->(:phenotype)
Great
Remember spechas snake case for props
On Mon, Apr 30, 2018, 10:33 Kent Shefchek notifications@github.com wrote:
I've made some progress cleaning up the scigraph.ncats.io graph
- Created clique arrays as node properties, eg someNode.clique = [foo,bar]
- Remove eq|sameAs edges and nodes
- Added inhertiance labels, eg diseaseNode.inheritance = "Autosomal Recessive"
- Added frequency, age of onset iri and labels (frequency, frequency_label, onset, onset_label)
- Added publications, evidence codes as node properties (reified edges still there), eg node.sources = [], node.evidence []
- Added inferred edges for gene to disease, some gene to phenotype (human), along with aggregating source and evidence lists in the process
For example to get gene to disease:
(:gene)-[edge:
http://purl.obolibrary.org/obo/RO_0002326
]- http://purl.obolibrary.org/obo/RO_0002326%5D->(:disease)gene to phenotype
(:gene)-[edge:
http://purl.obolibrary.org/obo/RO_0002200
]- http://purl.obolibrary.org/obo/RO_0002200%5D->(:phenotype)The inferences for human G2P are more liberal than what we index in solr for the monarch. Theres also mouse and zebrafish data.
disease to phenotype:
(:disease)-[edge:
http://purl.obolibrary.org/obo/RO_0002200
]- http://purl.obolibrary.org/obo/RO_0002200%5D->(:phenotype)— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/NCATS-Tangerine/kgx/issues/1#issuecomment-385471023, or mute the thread https://github.com/notifications/unsubscribe-auth/AADGOZCRsC0UzMI3mu_QsscSuINnJr2Zks5tt0rNgaJpZM4Tg5Xx .
All new props are snake case, these are all the new ones: clique, frequency, frequency_label, onset, onset_label, sources, evidence, inheritance
After chatting with @putmantime, realized I've forgotten the edge property isDefinedBy which is generated at load time by SciGraph. I'll rerun and make these changes: sources - source of the data (ontology, rdf), replaces isDefinedBy publications - literature references, replaces what I was calling "sources" evidence - ECO codes, no change from current
Added inhertiance labels, eg diseaseNode.inheritance = "Autosomal Recessive"
@mbrush we should add this as a node proper under disease in the model
how is this coming along?
looking at http://neo4j.monarchinitiative.org/
seems we are
lacking names
every node should have a name property, unless the rdfs:label was null, can you give me an example?
lacking CURIE IDs
I won't be able to add curie IDs with the current approach
only have G2P2D?
This is what I was able to do as a first pass, but we can add more before the hackathon
On 8 May 2018, at 17:33, Kent Shefchek wrote:
lacking names every node should have a name property, unless the rdfs:label was null, can you give me an example?
see screenshot
lacking CURIE IDs I won't be able to add curie IDs with the current approach
hmm, should we explore going the original route of querying into in-memory or files and transforming those
only have G2P2D? This is what I was able to do as a first pass, but we can add more before the hackathon ok!
-- You are receiving this because you authored the thread. Reply to this email directly or view it on GitHub: https://github.com/NCATS-Tangerine/kgx/issues/1#issuecomment-387584029
That node looks okay to me:
Keeping this issue open until we can have KGX read directly from Monarch and create a BioLink compliant Monarch KG.
@deepakunni3 Could this ticket please get some bread crumbs dropped to help trace the artifacts leading to its resolution.
Closing for now as there is a SRI Reference KG in KGEA I believe. It is also going through some revisions and refactoring alongside the Dipper refactor.
https://github.com/Knowledge-Graph-Hub/sri-reference-kg https://archive.translator.ncats.io/
Advice: @kshefchek
aka "monarch lite transform"
will assign @deepakunni3 @yy20716