Closed amykglen closed 3 months ago
realized perhaps the ranker needs access to provided_by
and publications
for edges. so the idea might require one little tweak there.
I think this is a terrific idea! Three thumbs up! I can add a query option to control whether this happens to the API. I'm thinking the default should be to INclude metadata and have an option to turn it off? Maybe something like minimal_metadata = True? But feel free to suggest alternatives, I don't feel strongly. We could allow the option for ARAX, too.
nice, that sounds great to me!
I could add a class ARAXDecorator
(or something like that), that would just need to be plugged into ARAXQuery
I guess (automatically called after ranking and everything is done, depending on the minimal_metadata
option?)
One thing to note: the ranker in resultify() does use some of the additional properties. One example is the number of publications to inform literature based rankings. Others, which I don’t think would be affected by what you are proposing (but thought I’d mention it), are things like chi-square values, probabilities, log ratios, etc which are added by overlay stuff.
ok, changes for the bulk of this issue and #1359 are in master
and are ready to be rolled out to ARAX and the KG2 API.
large queries using plover seem to be moving much faster now that there's almost no 'postprocessing' step. (I'm seeing about 30 seconds for the plover-related portion of the second hop in the query in #1370 vs. about 120 seconds before.. my internet isn't super fast though, and the vast majority of that 30 seconds is spent just receiving plover's response (which is fairly large now since edge/node objects are returned instead of just IDs), so I'm expecting the time to be better than that for arax.ncats.io.)
I made it so that plover returns edges in full (including publications, provided_by) but nodes only with their 'core' properties (name, category). nodes are then decorated with additional attributes at the very end of Expand (by calling ARAXDecorator
). I think this node decoration should ultimately happen at the end of ARAXQuery instead, after results have been filtered (I just wasn't totally sure where to plug it in). figuring that can be done as time allows.
also, if/when the minimal_metadata
parameter is set up that will be very easy to plug into expand
.
I have just rolled out master
to all deployments including production and /kg2. Please test
awesome, thanks! looking good to me! I suppose I'll leave this issue open until we get the call to ARAXDecorator moved to ARAXQuery (which should help free up memory/save further time) and get the minimal_metadata
flag set up
@amykglen what's the state of this issue?
I think we should close this one - there hasn't really been demand for it and it could be complicated to implement
with #1359 and #1370 in mind, I've been playing with this idea: might it make sense to decorate nodes/edges involved in an ARAX query answer with the nice 'additional'
attribute
s provided by KG2/c (description, iri, publications, provided_by, etc.) only at the very end of an ARAX query, after the results have been filtered down and everything?I believe ARAX modules downstream of expand don't use this information(?), and most of the time a huge chunk (99% for medium/large queries!) of those nodes/edges are thrown out at the end of ARAX's processing anyway, when it filters results.
meaning, the system could look something like this:
id
,name
,category
for nodes andid
,subject
,object
,predicate
for edges)expand
(?)kg2c.sqlite
)this would eliminate the plover 'postprocessing' step (which can be time consuming for large queries, when it has to look up thousands/millions of nodes/edges in sqlite). and the number of nodes/edges that ARAX would need to decorate would be quite small really, since ARAX automatically filters results down to 100 (for JSON queries at least). so looking up these nodes/edges in sqlite could happen very quickly. (maybe we wouldn't even need Redis.)
I think this would result in a big reduction in memory consumption for large queries as well, and might help with #1370.
if we were worried about the fact that our additional
attribute
s wouldn't be available for people who hit up the KG2 API directly, maybe we could add some sort of special parameter that specified whether to decorate or not decorate (so ARAX would just always tell the KG2 API to 'not decorate').any thoughts?