Open amykglen opened 2 years ago
leaving some notes for @rcpeene here since I'm going to be out for a bit (back Aug. 31) - things to do/figure out for this issue:
/meta_knowledge_graph
endpoint. (probably work with @edeutsch to figure this out)knowledge_type: 'inferred'
queries for, @dkoslicki and/or @chunyuma would likely be able to help figure that outsince our KPs now differ depending on maturity level, does each maturity/branch need its own meta KG? (@edeutsch can likely answer this)
Yes, I suppose yes. But I was sort of under the impression that each instance/endpoint has its own meta KG? When I restart an ARAX endpoint and do a test query, it seems to go through a process of checking the meta kgs of all its KGs. I was sort of thinking that the output of that process would/could be a merged meta KG?
So perhaps the meta KG should be computed dynamically by each endpoint as it manages its KPs as it already does? And thus maturity/branch is not really relevant except insofar as different endpoints will access different KP endpoints because of that?
I may not be understanding the situation well.
Yes, I think that is correct @edeutsch. Unless we decide otherwise, the meta KGs that are examined and used will be based on the KP endpoints that our instance decides to access (which will be constrained on the basis of version and maturity). I believe KPSelector
is the class that checks each KP's meta KG in the way you're referring to. I intend to the put the logic that creates an ARAX meta-KG there. The result would be that each ARAX instance has a different meta-kg.json stored somewhere that we decide.
this may lead to different endpoints that should have the same meta kg having somewhat different ones. But such are the risks of such a distributed system. I think it's the best way. it would be good to document the caching strategy so we're all aware of it
I have the logic implemented which fetches each available KP's meta-KG and makes a large super-meta-KG for ARAX, and stores it in a file meta_kg.json
, in ARAX/ARAXQuery/Expand
. This logic merges meta-nodes that have the same node key by producing a set union of their id_prefixes
property, and creates a concatenated list of their respective attributes
lists. For meta-edges, it combines ones that have the same subject--object--predicate
triple by creating a concatenated list of their respective attributes
lists. There is not yet logic to handle the knowledge_types
property of meta-edges.
It's worth noting that the resulting ARAX meta-KG is very large; 76 meta-nodes and 48,526 meta-edges.
To make explicit the caching mechanism that this system currently uses; it piggybacks off of the mechanism that KPSelector
uses to load the "meta-map". A new meta-kg.json
is made and written anytime the meta-map is refreshed. This happens if the meta-map hasn't been refreshed for more than 24 hours or if the existing meta-map doesn't contain some KPs which were found from Smart API at the onset of the query. In other words, the meta-kg is recreated at least every 24 hours, and also any time a new valid KP is found in the Smart API registry.
Two additional notes:
This means that if something changes about a KP which makes it newly available to an instance of ARAX, the meta-KG will be recreated. For instance, if a KP's TRAPI version is upgraded or if a new compatible maturity is added to it.
Currently, the meta-KG is being recreated every time I run a query. This is because the meta-map is being refreshed every time. In turn, this is because two valid KPs are showing up in the Smart API registry but they fail to provide a proper meta-KG every time the meta-map is refreshed, and they are therefore not added to the meta-map. This might be considered a bug or undesirable behavior. If we deem it so, it might be worth creating a separate issue for that.
thanks, this is a good explanation. What is the performance impact of this? i.e. how long does it take to do this rebuilding?
It's worth noting that the resulting ARAX meta-KG is very large; 76 meta-nodes and 48,526 meta-edges.
What is the size of RTX-KG2 metaKG alone?
RTX-KG2's meta-KG contains 57 meta-nodes and 45,2813 meta-edges, as of my last check, making up a significant majority of ARAX's meta-KG. As for time performance, the duration of the rebuilding process is highly variable since it depends on many requests to KPs. It seems as though the process of building the Meta-KG takes about 10 seconds, and a similar 10 seconds for refreshing the meta-map
I've added a bit more logic to remove 'null' properties that don' need to exist in the Meta-KG, and to properly assign values to the knowledge_types
property of meta-edges. After my meeting with @dkoslicki, my understanding is that most meta-edges in ARAX should have only 'lookup' (default) as their knowledge_types
values. The exceptions are meta-edges that have a subject--predicate--object triple of the following form ChemicalMixture--ameliorates--DiseaseOrPhenotypicTrait
, or DiseaseOrPhenotypicTrait--is_ameliorated_by--ChemicalMixture
. This includes subject, object, and predicate values that are descended from the categories I just mentioned. Meta-edges with these triples have both 'lookup' and 'inferred' as their knowledge_types. Unless someone else has feedback, I think the Meta-KG is complete with this implementation.
Further discussion; It looks like the Meta-KG creation process is much longer than I originally estimated. It takes >30 seconds in most of the tests I ran. After talking to @amykglen, we decided to change the caching mechanism so that it only gets updated once every 24 hours.
Code has been tested and pushed. A PR has been issued.
Should this be marked high priority? Just wondering what else might be gating on this issue.
it's just waiting on me to review/merge. though it's not high priority, since it's an optional feature. I'll merge it soon
one thing I think we should do before merging issue1879
is combining the two requests to get KPs' meta KGs. it looks like right now there's one request to get info for Expand's 'meta map' and another to build this new meta KG, even though the building of those two things is paired; think we should combine these since it's somewhat time-consuming to get all KPs' meta KGs.
TRAPI 1.3 recommends but does not require that we provide a meta_knowledge_graph for ARAX that is a union of all of its KPs' meta_knowledge_graphs: https://github.com/NCATSTranslator/ReasonerAPI/commit/e2ed87aa4f02dac55dcbd8eac7e190b8c188fbdd
maybe this union meta KG should be automatically generated and periodically updated by the
KPSelector
module, since that's already where we pull all KPs' meta KGs and cache info from them.we're also supposed to indicate in the meta KG for which meta edges ARAX can answer
knowledge_type: 'inferred'
queries: https://github.com/NCATSTranslator/ReasonerAPI/pull/333/files