Open colleenXu opened 6 months ago
Addition! For TestCase_26, it looks like the input curie has an extra trailing space: "input_curie": "MONDO:0018958 "
. And the corresponding assets (174-177) all have this extra space.
This probably explains why the tools are skipping the query or have no results/throw errors for all the assets
@sierra-moxon could this issue potentially be moved to the feedback repo for TAQA?
See some of the other issues for specific explanations of some of these universal failures. e.g. https://github.com/NCATSTranslator/Tests/issues/93 for Asset NCATSTranslator/Feedback#79.
Re: Asset NCATSTranslator/Feedback#69 "Acceptable: Soot_treats_Obstructive_Sleep_Apnea"
Why is this acceptable? Shouldn't it be "Never Show"? Where is the research that shows soot treats OSA?? Or other inference? If anything, soot causes OSA. Currently the answer from Translator for "What drugs treat OSA" does not include soot, so I cannot query the provenance via Translator. https://ui.ci.transltr.io/main/results?l=Obstructive%20Sleep%20Apnea%20Syndrome&i=MONDO:0007147&t=0&r=0&q=670443ba-f001-450b-9709-c2d9d54c7d65
Asset NCATSTranslator/Feedback#69 is a duplicate of Asset NCATSTranslator/Feedback#68 (also "Acceptable: Soot_treats_Obstructive_Sleep_Apnea"). Which a number of Tools are passing.
This is the Information Radiator description for Asset NCATSTranslator/Feedback#69 https://informationradiator.renci.org/test-runs/37/tests/5975#log-0 Calling ARS Test Runner with: { "environment": "ci", "predicate": "treats", "runner_settings": [ "inferred" ], "expected_output": "Acceptable", "input_curie": "MONDO:0007147", "output_curie": "MESH:D053260" }
This is the (blank!) Information Radiator description for Asset NCATSTranslator/Feedback#68 https://informationradiator.renci.org/test-runs/37/tests/5974#log-0 No logs
Maybe some tools are passing NCATSTranslator/Feedback#68 b/c it is blank??
@jaredroach The difference between Asset NCATSTranslator/Feedback#68 and NCATSTranslator/Feedback#69 are that the former is NeverShow and the latter is Acceptable. Both are listed because they could be perceived as "correct" by multiple different user personas. An argument could be made that the NeverShow test should never happen, but that is more a question for TAQA.
The Information Radiator is not without its bugs, and that is what is happening with the "No logs". Unless something catastrophic has happened with the tests, there should always be some logs, but you just have to refresh the page if it ends up showing you that message.
@maximusunc If I understand it correctly, then a Tool will fail an "Acceptable" test if it does not report the result in the top 50%. That means that Tools that do not report "soot" as a top-50% treatment for OSA are going to get dinged. Which is definitely not the intent of these Tests. We should delete Asset NCATSTranslator/Feedback#69.
1_TopAnswer: The expected output is in the top 10% of results 2_Acceptable: The expected output is in the top 50% of results 3_BadButForgivable: The expected output is either not present or in the bottom 50% of results 4_NeverShow: The expected output is not in the results
Hi @jaredroach The reason I put soot treats sleep apnea as a mechanistics acceptable answer is that, it is likely one of the cause so by doing operations between differtent results (including the cause) I am likely to get better results at the graph that I would be looking for
@sandrine-muller-research OK. But you are not disagreeing with me that Asset NCATSTranslator/Feedback#69 should be deleted, are you? We don't want to penalize tools that don't report soot.
When this sheet has been done the rule "2_Acceptable: The expected output is in the top 50% of results" was not defined like that and we had another definition of acceptable if my memory serves me well, which was dependent on the persona. I will remove those lines and build another suite from it that will be focused of user preference and will have possible inconsistent results depending on personas. The tests will be different than a pass/fail test.
In the latest run (3/10), I noticed that 24 tests were not passed by any tool. These tests may warrant a closer look to figure out what's going on.