Open edeutsch opened 2 years ago
Thanks to Finn and Amy, this now seems resolved in master
:
Retrieved a message with 34 results:
- 2.679 ?
- 2.608 ?
- 2.605 ?
- 2.129 ?
- 1.657 ?
- 1.524 ?
- 1.340 ?
- 1.340 ?
- 1.340 ?
- 1.340 ?
- 1.340 ?
- 1.268 ?
- 1.268 ?
- 1.265 ?
- 1.265 ?
- 1.265 ?
- 1.020 ?
- 0.790 ?
- 0.790 ?
- 0.536 ?
- 0.332 ?
- 0.317 ?
- 0.317 ?
- 0.268 ?
- 0.268 ?
- 0.268 ?
- 0.268 ?
- 0.268 ?
- 0.268 ?
- 0.268 ?
- 0.268 ?
- 0.268 ?
- 0.268 ?
- 0.268 ?
After score removal:
-None ?
-None ?
-None ?
-None ?
-None ?
-None ?
-None ?
-None ?
-None ?
-None ?
-None ?
-None ?
-None ?
-None ?
-None ?
-None ?
-None ?
-None ?
-None ?
-None ?
-None ?
-None ?
-None ?
-None ?
-None ?
-None ?
-None ?
-None ?
-None ?
-None ?
-None ?
-None ?
-None ?
-None ?
Create a new request with the previous message and a workflow to rerank (overlay_connect_knodes,complete_results,score)
Results (19):
- 1.000 Ebola hemorrhagic fever
- 0.947 viral hemorrhagic fever
- 0.895 Filoviridae infectious disease
- 0.807 hemorrhagic fever
- 0.684 infectious disease or post-infectious disorder
- 0.649 viral infectious disease
- 0.632 viral disease or post-viral disorder
- 0.632 zoonosis
- 0.614 arbovirus infection
- 0.579 Non-Neoplastic Disorder by Special Category
- 0.526 Mononegavirales infectious disease
- 0.474 Non-Neoplastic Disorder
- 0.421 primary viral infectious disease
- 0.316 Disorder characterized by fever
- 0.298 Test Result
- 0.211 Inflammatory Response
- 0.158 Hyperthermia
- 0.088 infectious disease
- 0.070 Inflammation
Data: https://arax.ncats.io/api/arax/v1.2/response/36278
GUI: https://arax.ncats.io/beta/?r=36278
Aragorn result: https://arax.ncats.io/beta/?r=99adc6b5-0803-4c52-972e-fe587744a7aa ARAX reranked result: https://arax.ncats.io/beta/?r=36278
We will want to revisit this after the relay to see if we want to undo the hacks we needed to implement to get this to work.
Solved for now (in /beta, not production) BUT, we should probably revisit this when there is a Translator wide decision on whether edge_bindings to non-existent qedges is legal.
It is interesting to note that their first/top result is the exact disease that was queried against, and so all edges are self edges. I can see this kind of thing potentially throwing off both automated systems and human reviewers. It feels like it should not be a thing...?
I would have thought so too, but I recall hearing Chris Mungall, grand master of ontologies, state that all entities are always a subclass of themselves. Stated as an obvious fact that everyone knows. I hope I am not misremembering.
But I should ask. I'll ask on the testing channel.
all entities are always a subclass of themselves
This is how infinities are born... ;-)
So one possible complication that some might object to that is brought to light by this example:
The (vague) ask was to "rerank results".
We are actually not so much reranking the results as completely discarding the results and recomputing the results and ranking those.
The original set of results was 34. After our "reranking", there are only 19. this is because there appear to be what might be considered duplicate results in the original 34. If we were truly "reranking the results", we would output 34 results.
I don't know if this will be viewed favorably or unfavorably, but there it is. It will cause discussion.
The question will come up and I don't know the answer: do we have the ability to not change (discard and recompute) the results but rather just rerank them? (so that we would emit 34 results in this case)
For reference, the workflow we used for this is:
"workflow": [
{
"id": "overlay_connect_knodes"
},
{
"id": "complete_results"
},
{
"id": "score"
},
]
(as discussed)
As a slightly separate issue, I tried running with just:
"workflow": [
{
"id": "score"
},
]
and got:
2022-02-04T06:09:00.152997 ERROR: An uncaught error occurred: 'd347fc18-3bfd-4f5c-8fc0-d6ce45a2d2f6': ['Traceback (most recent call last):\n', ' File "/mnt/data/orangeboard/beta/RTX/code/UI/OpenAPI/python-flask-server/openapi_server/controllers/../../../../../ARAX/ARAXQuery/ARAX_query.py", line 753, in execute_processing_plan\n ranker.aggregate_scores_dmk(response)\n', ' File "/mnt/data/orangeboard/beta/RTX/code/UI/OpenAPI/python-flask-server/openapi_server/controllers/../../../../../ARAX/ARAXQuery/ARAX_ranker.py", line 621, in aggregate_scores_dmk\n _score_networkx_graphs_by_frobenius_norm])))\n', ' File "/mnt/data/orangeboard/beta/RTX/code/UI/OpenAPI/python-flask-server/openapi_server/controllers/../../../../../ARAX/ARAXQuery/ARAX_ranker.py", line 618, in <lambda>\n scorer_func),\n', ' File "/mnt/data/orangeboard/beta/RTX/code/UI/OpenAPI/python-flask-server/openapi_server/controllers/../../../../../ARAX/ARAXQuery/ARAX_ranker.py", line 160, in _score_result_graphs_by_networkx_graph_scorer\n results)\n', ' File "/mnt/data/orangeboard/beta/RTX/code/UI/OpenAPI/python-flask-server/openapi_server/controllers/../../../../../ARAX/ARAXQuery/ARAX_ranker.py", line 67, in _get_weighted_graphs_networkx_from_result_graphs\n result))\n', ' File "/mnt/data/orangeboard/beta/RTX/code/UI/OpenAPI/python-flask-server/openapi_server/controllers/../../../../../ARAX/ARAXQuery/ARAX_ranker.py", line 49, in _get_weighted_graph_networkx_from_result_graph\n kg_edge = kg_edge_id_to_edge[edge_binding.id]\n', "KeyError: 'd347fc18-3bfd-4f5c-8fc0-d6ce45a2d2f6'\n"]
If someone has time, it would nice for this at least to not to produce an unsightly stack trace, maybe just some nice English error messages.
And maybe someone has ideas on how to make this better. In principle it would be possible to see and keep and use their weights rather than just throwing them out and starting over.
@edeutsch I just tried running this with jst score and i worked for me: https://arax.ncats.io/beta/?r=36489
Though since we don't see any scores we recognize in the results we rank everything as 1.
If we instead run the workflow:
"workflow": [
{
"id": "overlay_connect_knodes"
},
{
"id": "score"
},
]
We can get ranks without nuking the results: https://arax.ncats.io/beta/?r=36488
Oh the above was run with the BTE json. I just realized that you ran the example that Chris posted. I got the same error when running that.
ah interesting idea running complete_results. But as you see, I'm getting stack traces with my test queries.
@edeutsch So what is weird is that when I run the code locally I don't get the error and it seems to work fine and it re-ranks correctly.
Oops looks like the error was with my editing of the rerank test script. I changed endpoint_url
but didn't notice it was used again to submit the query later. It's fixed now and looks like it works at least with the clinical_DCP example: https://arax.ncats.io/beta/?r=36497
I tried to run an Aragorn message through reranking and it didn't go so well. I used 99adc6b5-0803-4c52-972e-fe587744a7aa
and the result was:
Ideas?