Noticed that all tools are failing some tests

colleenXu commented 6 months ago

In the latest run (3/10), I noticed that 24 tests were not passed by any tool. These tests may warrant a closer look to figure out what's going on.

Case Number	Asset Number	Name
8	8	TopAnswer: fingolimod_treats_Multiple_Sclerosis
8	9	TopAnswer: natalizumab_treats_Multiple_Sclerosis
21	14	TopAnswer: interferon-beta_1a_treats_Multiple_Sclerosis
19	16	TopAnswer: desferrioxamine_treats_Aceruloplasminemia
32	32	TopAnswer: Lactase_treats_Lactose_intolerance
12	69	Acceptable: Soot_treats_Obstructive_Sleep_Apnea
23	73	TopAnswer: Mipomersen_treats_Homozygous_Familial_Hypercholesterolemia
0	79	TopAnswer: progestin_treats_Premature_Menopause
8	116	Acceptable: Anticholinergic_agents_treats_Multiple_Sclerosis
9	180	Acceptable: Pegloticase_treats_Gout
20	183	NeverShow: Riley-day_Syndrome_treats_Hereditary_Sensory_And_Autonomic_Neuropathy
5	186	TopAnswer: Insulin_human_treats_Diabetes_Mellitus
34	262	TopAnswer: AZD_3355_treats_Gastroesophageal_Reflux_Disease
34	264	TopAnswer: Talcid_treats_Gastroesophageal_Reflux_Disease
19	303	TopAnswer: deferipone_treats_Aceruloplasminemia
19	304	TopAnswer: deferasirox__treats_Aceruloplasminemia
21	306	TopAnswer: interferon-beta_1b_treats_Multiple_Sclerosis
37	315	TopAnswer: Fluticasone_treats_Asthma
37	316	Acceptable: Albuterol_(salbutamol)_treats_Asthma
39	322	Acceptable: Erythromycin_treats_Idiopathic_bronchiectasis
40	325	Acceptable: Fostamatinib_treats_Idiopathic_pulmonary_fibrosis
40	326	Acceptable: Acceptable: GLPG1690_(Ziritaxestat)_treats_Idiopathic_pulmonary_fibrosis
41	328	Acceptable: Ridaforolimus_treats_Lymphangioleiomyomatosis
42	330	Acceptable: Ensifentrine_treats_Primary_ciliary_dyskinesia

colleenXu commented 6 months ago

Addition! For TestCase_26, it looks like the input curie has an extra trailing space: "input_curie": "MONDO:0018958 ". And the corresponding assets (174-177) all have this extra space.

This probably explains why the tools are skipping the query or have no results/throw errors for all the assets

maximusunc commented 6 months ago

@sierra-moxon could this issue potentially be moved to the feedback repo for TAQA?

jaredroach commented 6 months ago

See some of the other issues for specific explanations of some of these universal failures. e.g. https://github.com/NCATSTranslator/Tests/issues/93 for Asset NCATSTranslator/Feedback#79.

jaredroach commented 6 months ago

Re: Asset NCATSTranslator/Feedback#69 "Acceptable: Soot_treats_Obstructive_Sleep_Apnea"

Why is this acceptable? Shouldn't it be "Never Show"? Where is the research that shows soot treats OSA?? Or other inference? If anything, soot causes OSA. Currently the answer from Translator for "What drugs treat OSA" does not include soot, so I cannot query the provenance via Translator. https://ui.ci.transltr.io/main/results?l=Obstructive%20Sleep%20Apnea%20Syndrome&i=MONDO:0007147&t=0&r=0&q=670443ba-f001-450b-9709-c2d9d54c7d65
Asset NCATSTranslator/Feedback#69 is a duplicate of Asset NCATSTranslator/Feedback#68 (also "Acceptable: Soot_treats_Obstructive_Sleep_Apnea"). Which a number of Tools are passing.
This is the Information Radiator description for Asset NCATSTranslator/Feedback#69 https://informationradiator.renci.org/test-runs/37/tests/5975#log-0 Calling ARS Test Runner with: { "environment": "ci", "predicate": "treats", "runner_settings": [ "inferred" ], "expected_output": "Acceptable", "input_curie": "MONDO:0007147", "output_curie": "MESH:D053260" }
This is the (blank!) Information Radiator description for Asset NCATSTranslator/Feedback#68 https://informationradiator.renci.org/test-runs/37/tests/5974#log-0 No logs

Maybe some tools are passing NCATSTranslator/Feedback#68 b/c it is blank??

maximusunc commented 6 months ago

@jaredroach The difference between Asset NCATSTranslator/Feedback#68 and NCATSTranslator/Feedback#69 are that the former is NeverShow and the latter is Acceptable. Both are listed because they could be perceived as "correct" by multiple different user personas. An argument could be made that the NeverShow test should never happen, but that is more a question for TAQA.

The Information Radiator is not without its bugs, and that is what is happening with the "No logs". Unless something catastrophic has happened with the tests, there should always be some logs, but you just have to refresh the page if it ends up showing you that message.

jaredroach commented 6 months ago

@maximusunc If I understand it correctly, then a Tool will fail an "Acceptable" test if it does not report the result in the top 50%. That means that Tools that do not report "soot" as a top-50% treatment for OSA are going to get dinged. Which is definitely not the intent of these Tests. We should delete Asset NCATSTranslator/Feedback#69.

1_TopAnswer: The expected output is in the top 10% of results 2_Acceptable: The expected output is in the top 50% of results 3_BadButForgivable: The expected output is either not present or in the bottom 50% of results 4_NeverShow: The expected output is not in the results

sandrine-muller-research commented 5 months ago

Hi @jaredroach The reason I put soot treats sleep apnea as a mechanistics acceptable answer is that, it is likely one of the cause so by doing operations between differtent results (including the cause) I am likely to get better results at the graph that I would be looking for

jaredroach commented 5 months ago

@sandrine-muller-research OK. But you are not disagreeing with me that Asset NCATSTranslator/Feedback#69 should be deleted, are you? We don't want to penalize tools that don't report soot.

sandrine-muller commented 5 months ago

When this sheet has been done the rule "2_Acceptable: The expected output is in the top 50% of results" was not defined like that and we had another definition of acceptable if my memory serves me well, which was dependent on the persona. I will remove those lines and build another suite from it that will be focused of user preference and will have possible inconsistent results depending on personas. The tests will be different than a pass/fail test.

NCATSTranslator / Tests

Noticed that all tools are failing some tests #32