ExposuresProvider / icees-api

MIT License
2 stars 8 forks source link

Explore ways to support patient-level longitudinal multivariate analysis #290

Open karafecho opened 10 months ago

karafecho commented 10 months ago

Having implemented an approach to support cohort- and study period-level longitudinal multivariate analysis, i.e., by allowing users to select year (i.e., study period) as an input feature in a multivariate request, this issue is to suggest that we explore ways to support patient-level longitudinal multivariate analysis. The approach that I had originally conceived was to allow users to select PatientID (i.e., the dummy variable that links patients across years / study periods). This would allow users to retrieve a subset of the underlying deidentified integrated feature table. However, the approach is not computationally feasible, given the large patient sample sizes (e.g., roughly 160,000 total patients in asthma cohort).

One approach might be to put a cap on the cohort size for which users are allowed to include PatientID as an input feature in a multivariate request. To implement this, we could (1) return an error when users attempt to include PatientID as an input feature in a multivariate request AND request to do so for a cohort of size TBD and (2) update the documentation to reflect the limitation. While this approach seems relatively straightforward, it also seems rather arbitrary and statistically unsound.

Another approach might be to create a new multivariate endpoint, one that accepts the following user input: (1) a primary outcome / dependent variable, (2) a set of predictors / independent variables, (3) an optional factor(s) to control for repeated observations (e.g., PatientID, year), and (3) a desired multivariate model (e.g., GLM, conditional random forest). The model would then be applied to the data on the backend, and the endpoint would return model output. This approach may work, although (1) we would have to develop general-purpose models and (2) the run time may be slow, but that's a lesser concern, IMO.

karafecho commented 10 months ago

Per discussion with Hong, 10.18.2023: Maybe include PatientID as input parameter, similar to year? PatientID=1 or PatientID=1-10.

karafecho commented 10 months ago

Related to #286