ExposuresProvider / icees-api

MIT License
2 stars 8 forks source link

Adjust the edge attributes returned from KG endpoint #184

Closed karafecho closed 2 years ago

karafecho commented 2 years ago

The ICEES KG endpoint currently returns quite a bit of info. While this is helpful for users, I'm somewhat concerned that this may add too much "noise" to the Dec demo and thus backfire. This ticket is intended to initiate a team discussion.

ICEES DILI instance

<html><body>
<!--StartFragment-->

  | {   "chi_squared": 2.3984928470222595,   "columns": [     {       "frequency": 221,       "percentage": 0.8339622641509434     },     {       "frequency": 44,       "percentage": 0.1660377358490566     }   ],   "feature_a": {     "biolink_class": "biolink:ChemicalEntity",     "feature_name": "PrednisoneOrCorticosteroids_Tx",     "feature_qualifiers": [       {         "operator": "=",         "value": "No"       },       {         "operator": "=",         "value": "Yes"       }     ],     "year": null   },   "feature_b": {     "biolink_class": "biolink:Disease",     "feature_name": "PulmonaryDisease",     "feature_qualifiers": [       {         "operator": "=",         "value": "No"       },       {         "operator": "=",         "value": "Yes"       }     ],     "year": null   },   "feature_matrix": [     [       {         "column_percentage": 0.8642533936651584,         "frequency": 191,         "row_percentage": 0.8488888888888889,         "total_percentage": 0.720754716981132       },       {         "column_percentage": 0.7727272727272727,         "frequency": 34,         "row_percentage": 0.1511111111111111,         "total_percentage": 0.12830188679245283       }     ],     [       {         "column_percentage": 0.13574660633484162,         "frequency": 30,         "row_percentage": 0.75,         "total_percentage": 0.11320754716981132       },       {         "column_percentage": 0.22727272727272727,         "frequency": 10,         "row_percentage": 0.25,         "total_percentage": 0.03773584905660377       }     ]   ],   "p_value": 0.12145221113869516,   "rows": [     {       "frequency": 225,       "percentage": 0.8490566037735849     },     {       "frequency": 40,       "percentage": 0.1509433962264151     }   ],   "total": 265 }
-- | --

<!--EndFragment-->
</body>
</html>
karafecho commented 2 years ago

Update from my slack message, 10.21.22, related specifically to Workflow B.1:

ICEES DILI returns a contingency matrix in JSON format. Because Workflow B.1 asks about patients with a diagnosis of DILI, ICEES DILI returns responses for all patients in the cohort. This in and of itself is not an issue. However, a contingency table does not make sense in this case and cannot be computed.

At present, the contingency table that is returned is largely empty with Chi Square statistic = 0 and P = 1. This is just noisy information, and it is kind of deceptive to return statistics and a P value when they cannot be calculated. The correct response should probably be Chi Square statistic = NaN, P = NaN.

I am wondering if we can change the format of the answers that ICEES returns, but do it in a generalizable manner that would support any queries of any ICEES instance. As an example, I've appended an answer from COHD below. (Answer #1 from https://arax.ncats.io/?r=30305.)

image

While COHD returns certain calculations that ICEES doesn't, there's no reason why we can't do the same. In fact, this is something I discussed with Hao a few times. At the very least, I feel like we should report ICEES results in a manner that is similar to COHD and more easily digestible by users.

karafecho commented 2 years ago

Clinical Risk Provider returns results that are structure similar to COHD:

image

karafecho commented 2 years ago

Closing issue. To be discussed during mini-retreat.