NCATS-Tangerine / beacon-aggregator

A web service that operates over the Beacon network to provide a single software interface over the all the Beacons
Other
2 stars 0 forks source link

Make cliques be disconnected subgraphs induced by "same as" edges. #89

Open lhannest opened 5 years ago

lhannest commented 5 years ago

Right now cliques are saved as arrays of curie's that correlate with matrices of beacon ID's (where the column of the matrix correlates with the column of the array, indicating which beacons have produced which curie's.

It would be nicer to represent cliques as subgraphs with "same as" edges. That way merging cliques would be trivial (adding an edge between two otherwise disconnected subgraphs). Retrieving and adding cliques would become a bit more complicated than it is now, but not much more. When adding nodes we could ensure that the subgraph is a tree (always add nodes in the direction (new node)-[same as]->(pre-existing node)), and then take the "clique leader" to be the root of the tree. When retrieving nodes something like this query should suffice:

MATCH path=(n)-[:`same as`*0..]-(m) RETURN DISTINCT m;

It would also allow us to add ontological depth to cliques with "sub class of" edges. I think this is an important aspect of cliques that we've so far overlooked.

It appears we can set up Spring to use multiple Neo4j databases: https://michael-simons.github.io/neo4j-sdn-ogm-tips/using_multiple_session_factories.html, it may be nice to have one solely for cliques and one for general data.

lhannest commented 5 years ago

This would also allow us to add curie's to cliques without specifying a source beacon, which we cannot currently do. For example, if we see that a clique contains OMIM.DISEASE:137220 then we can infer that it should also contain OMIM:137220.