Closed benwbooth closed 7 years ago
+1
Since relations have structure (class-subclass), they are in the graph as both nodes and edges. In order to generate the relation subclass closure we need to first return the node with the same IRI and traverse its parents.
@kshefchek That makes sense. Thanks for the explanation!
Although I'm also thrown off by the node ID part, the node ID is numeric so I'm not sure how this is working, unless the method is referencing the IRI property.
The GraphTransactionalImpl
module in SciGraph defines an idMap
which is a hashmap of String to numeric node IDs. So every time you add a node to a graph, it adds a string ID mapping for it. The String ID not actually stored in the Neo4j database.
Regardless, we do want to enable this to work on relations since we index the relation subclass closure, for example:
This isn't really a good example, but we use it when querying homology relations.
I vaguely recall now an external map file that contains IRI to node/edge ID mappings, this is likely what is in the SciGraphIdMap file.
Edit: yes this seems to be the case:
File dbLocation = new File("/home/kshefchek", "SciGraphIdMap");
DB db = DBMaker.newFileDB(dbLocation).closeOnJvmShutdown().transactionDisable().mmapFileEnable().make();
Map<String, Object> map = db.getHashMap("io.scigraph.neo4j.IdMap");
Iterator it = map.entrySet().iterator();
for (Map.Entry<String, Object> entry : map.entrySet()) {
String key = entry.getKey();
Object value = entry.getValue();
System.out.println(key + ": " + value);
}
Outputs:
http://purl.obolibrary.org/obo/GO_0000578: 319175
http://purl.obolibrary.org/obo/GO_0000502: 575893
http://purl.obolibrary.org/obo/GO_0000503: 575891
... etc
This is going to sound hypocritical given the current state of the code, but I would like to eventually move away from the hard coding of subject, object, etc. These refer to a specific solr schema, and we can foresee cases where we will want to reuse this code with an entirely different schema. This is probably not needed for this PR and can wait, @cmungall your thoughts?
Eventually I think it would be nice to make much of this configurable, for example something like this:
query: |
MATCH path=(foo)<-[bar:Owl:relationship]-(baz)
RETURN DISTINCT path,
foo, bar, baz
expandedFields:
foo_closure:
relations:
- OWL:subClassOf
- BFO:part_of
- Some:transitiveProperty
type: closure
label: foo_label
map: foo_map
foo_gene:
relations:
- RO:has_gene
type: direct
label: foo_gene_label
map: foo_gene_map
@benwbooth could you post an example config? I will test it on a sample graph with our queries.
@kshefchek Here is the example config I used for testing:
query: |
MATCH path=(subject:gene)-[relation:RO:0002206]->(object:`anatomical entity`)
RETURN DISTINCT path,
subject, object, relation,
'gene' AS subject_category,
'anatomy' AS object_category,
'direct' AS qualifier
object_closure: "rdfs:subClassOf|BFO:0000050"
Thanks! Is the default behavior to fall back to the hardcoded relations?
Yes, and the hardcoded relations are automatically added to any custom relations you specify as well.
Ran locally, all looks good to me!
This pull request adds some extra syntax to the monarch-cypher-queries yaml that allows specifying closure patterns for subject, object, relation, and evidence as a resolved cypher query.
The main changes are adding a
resolveRelationships
function toGolrLoader
which callscyperUtil.resolveRelationships
, and parses out the resolved types from the returned string. This approach allows usage of the!
entailment operator.I also added fields
subject_closure
,object_closure
,relation_closure
andevidence_closure
toGolrCypherQuery
, which is used to parse the yaml files.I had to add
curie-util
0.0.2 as an explicit dependency, otherwise version 0.0.1 would be brought in as a transitive dependency, andgolr-loader
seems to be coded for 0.0.2.I wrote a test case in
GolrLoaderTest
which should test thatobject_closure
is working. I hand-wrote a test graph which is set up inGolrLoadSetup
. The entireGolrLoaderTest
module was marked as@Ignore
, so I removed it and added@Ignore
to each individual test, then added my new test. I'm not sure why all these tests are being ignored, but it could be that the fixtures simply need to be updated and that hasn't been done yet.I'm using the
gene-anatomy
query frommonarch-cypher-queries
as the test query. The resultingobject_closure
value I get after running the query on my test graph contains:so it looks like the closure query is working.
There is a weird behavior in
GolrLoader.serializerRow
that I don't quite understand. If the cypher query returns a relation, the code gets theiri
property of the relation, then attempts to find a node in the graph with a String ID that matches the relation'siri
value. I'm not sure why it's doing this. As a workaround, I had to alter thegene-anatomy
query used in the test case so that it does not return the matched relation. Why would a node have an ID that matches theiri
of one of its relations? Maybe someone else can shed some light on this. Here is the code fromGolrLoader.serializerRow
:Fixes #17.