Open andrewsu opened 2 years ago
This was the intended behavior regarding #324. The TRAPI logs usually note that there were too many entities after the first hop to continue.
Perhaps we could return a different error code (not 200) - to make it clearer that there was an issue?
I'm not sure about returning results back because we wouldn't have completed the query graph (so the records / stuff available after the first hop wouldn't fully map onto the query graph or provide the answers asked for in the query graph)...
More info:
The API response: response.txt
Console logs:
@colleenXu What do you think about returning the message.knowledge_graph
portion of the response with the results of the first hop? The message.results
section would still be empty.
first step: run the first hop and see if there are different predicates...that'll help us tell if a log listing the predicates in the executed hop(s) would be useful or not.
Remember that this is just a two-hop, but this can happen in longer linear queries as well...
In discussion with @colleenXu, two things to change:
Revisiting this issue... I ran the two-hop query above through the ARS: https://arax.ci.transltr.io/?r=cbc0e82e-8397-4293-b11c-00e40859169a. (EDIT: this link actually corresponds to the query in the related issue #330 on Fanconi anemia, not the psoriasis query above.) As designed, it returns zero results with the following error message:
Error: Max number of entities exceeded (1000) in 'e02'
The one-hop query for e01
indeed returns 1022 results: https://arax.ci.transltr.io/?r=65737549-f327-4ff7-9006-9d0ab4daf236. The validator (results injected by the ARS) returns some useful stats -- we should consider returning this info directly in the logs (as suggested in the comment above):
"validation_result": {
"message": "There were validator errors",
"n_edges": 2054,
"n_nodes": 1044,
"provenance_summary": {
"n_sources": 26,
"predicate_counts": {
"biolink:affected_by": 2,
"biolink:caused_by": 60,
"biolink:condition_associated_with_gene": 379,
"biolink:contribution_from": 928,
"biolink:occurs_together_in_literature_with": 483,
"biolink:related_to": 181,
"biolink:subclass_of": 21
},
In addition, with the completion of the scoring overhaul in #634, the ranking of the 1022 results actually looks pretty good. So in the case where e01
retrieves more than our limit of entities, should we simply calculate scores for all intermediate answers, trim to the max allowed entities, and then continue with e02
? Of course we'd want to return some sort of warning, but my guess is that this must be what the other ARAs are doing in response to the two-hop query above...
There's also a separate question of prioritization. Does (or can) sentry track how often we hit this type of limit?
Sentry unfortunately doesn't seem to provide a solid way of searching for specific errors, so it's hard to track frequency of specific kinds of failures.
Anecdotally, we see this kind of error relatively frequently in the queues for sync queries to the /v1/team/Service Provider/query
endpoint.
I executed the following two-hop query. The number of entities exceeds our cap after executing the first edge, and BTE returns essentially an empty result (no results, no KG). We should consider providing some results back so that the user can adjust the query so it can successfully finish (by adding predicates, for example). Desired behavior needs some discussion...
(I would submit an ARS link, but I'm having issues running queries at the moment? Could be something unrelated to this specific issue?)