Open mbrush opened 5 months ago
Based on my recent testing, the ARS is not dropping support paths. Answers with many paths are typically lower in Sugeno and therefore not on the front page. I believe in your screenshot, the sorting may have been by Evidence and was prior to the score implementation.
https://ui.test.transltr.io/main/results?l=Common%20Cold&i=MONDO:0005709&t=0&r=0&q=9aefda7f-5337-4710-8698-182b29e8c1a1 > sort by the evidence column
Acebutolol - returned by ARAX only, with a .95 score. I believe since it was only one ARA and no other scoring components it was ranked 81 with sugeno .95 based on the O&O/Appraiser/ARS Sugeno pipeline
Methoxyflurane - returned by ARAX only, with a .93 score. I believe since it was only one ARA and no other scoring components it was ranked 107 with sugeno .93 based on the O&O/Appraiser/ARS Sugeno pipeline
the same applies to asthma wrt to the more evidence/paths are not on the first page
UI: https://ui.test.transltr.io/main/results?l=Asthma&i=MONDO:0004979&t=0&r=0&q=5c2b8789-9ae8-43ce-b11c-116578d863a8 > sort by the Evidence Column
however its even less clear how they were ranked, given they were returned by multiple ARAs, those those ARAs scores are over a range troleandomycin - ARAX score .63, BTE score .82, sugeno .93, rank 145 Triamcinolone - ARAX score .84, Improving score .39, BTE score .82, sugeno .98, rank 36
i get to the non-UI scores by using the "ARS merge result summary" Collab (https://colab.research.google.com/drive/1kKC0rCnL18z3sgDsD7P2bLpNoI-wtN6C?usp=sharing), which produced the spreadsheets here: https://drive.google.com/drive/folders/16BzfiYzDfFrNE1hB3DSggHY1qKRS_Bhj?usp=drive_link
after sorting in the UI by evidence, i took the top evidence substances and did a "find" search in the Excel spreadsheets and looked across the row.
so, systematically, i think this all working as designed. do i understand the score/rank? no
another way to look at it is by viewing, the ARAs top results:
ARAX: top result Corticotropin, this is the support graph
BTE: top result Fluticasone, this is the support graph
I don't believe this is an ARS merge issue. Based on the above:
it may be different for Prod. attempting to test all of this in Prod now.
The Problem:
I don't have hard evidence to support or quantify this, just my impressions and anecdotal comparisons. But I recall when I was doing more QA/testing last fall, seeing very many Results that had 30, 40, 50, even hundreds of support paths. These days, it is uncommon to find results with support paths in the double digits.
I have a few screengrabs from last fall to support my concern - which that show support path numbers we saw at that time. The image below shows the top results for "what may treat Cerebral Palsy?"):
Compare this to what I see in the top page of results for this query today (in test):
This is the most extreme example I had noted - but there are several others I could share that support my concern.
Also, note that there is no overlap in top 10 results, to allow for direct comparison of paths for a given result, but here is what the Triamcinolone result looks like today (15 paths compared to 154 last fall)
Given that support paths are the central focus of our UI, and the unique value we bring in Translator, I think it is worth some effort to understand if this drop is real/concerning, determine why it is happening, and figure out if/how we want to address it.
Several possible reasons have been proposed to explain what might be happening:
I suspect that # 2, 3, and 4 above are contributing some to the drop - in particular # 3, as I recall seeing many results based on long lists of paths based on this "ChemtreatsPhenoOfDisease" BTE template where hop1 always came from semmed).
But would like to be sure that # 1 is not a significant contributor, because this would represent a loss of legitimate and valuable support paths that would help users trust and understand results.
Proposed Action:
Are there some tests that could be run (by ARS folks perhaps) to assess the change in support path numbers over time, and try and understand if we may be loosing paths due to time limits or other performance/caching "improvements" that have been implemented in the last year? @MarkDWilliams @ShervinAbd92 Or is this something that each ARA might have to test on their own (@cbizon @andrewsu)?
Also curious what @sstemann has to say about this - given her UI testing experience/expertise.