ExposuresProvider / icees-api

MIT License
2 stars 8 forks source link

Explore ICEES+ API multivariate approach #254

Closed karafecho closed 11 months ago

karafecho commented 1 year ago

Recently, I worked with several students to develop and test a new ICEES+ functionality that supports the generation of multivariate tables for applications such as regression and random forest. Those students have since moved on, leaving me as the sole person who currently understands the approach and can implement it. As such, I thought it might useful for you to explore the functionality a bit ... but only if this is of interest to you.

Here's the initial methods paper: Fecho K,* Haaland P, Krishnamurthy A, Lan B, Ramsey S, Schmitt PL, Sharma P, Sinha M, Xu H. An approach for open multivariate analysis of integrated clinical and environmental exposures data. Inform Med Unlocked 2021;26:100733. doi.org/10.1016/j.imu.2021.100733. https://pubmed.ncbi.nlm.nih.gov/35875189/.

And here's the repo we worked from: https://github.com/ExposuresProvider/icees-kp-analytics. It's private, but it's under the ExposureProvider org, so I think you should have access, but please let me know if you don't.

karafecho commented 1 year ago

We have four additional papers that I can send you, if that would be helpful. Just let me know.

hyi commented 1 year ago

@karafecho This is pretty interesting and I have access to the private repo. I have downloaded the PubMed paper and will read that paper to start with. Yes, it'd be great if you can send me the other 4 papers if you have them handy. Thanks

karafecho commented 1 year ago

Yeah, I figured out the approach during a whiteboarding session with a very persistent student from the NC School of Science & Math who insisted on moving beyond bivariate associations. Took some creative thinking, but I soon realized that you could leverage the dynamic cohort creation functionality to generate multivariate tables through iterative requests to the OpenAPI for bivariate associations. The approach has limitations to be sure (e.g., data loss), but it's completely open and approved by the CDWH Oversight Committee (I formally requested approval, as I was a bit worried about certain aspects of the multivariate functionality), so definitely valuable for exploratory analysis.

Here are three of the papers:

https://pubmed.ncbi.nlm.nih.gov/34769911/ https://www.medrxiv.org/content/10.1101/2022.12.20.22283734v1 https://renci.org/technical-reports/tr-22-01/

Priya's paper has not yet been accepted for publication, and she didn't create a preprint, so I probably shouldn't post it to a public repo. I'll share it after it's accepted for publication (it was just resubmitted with minor revisions, per reviewer request).

A few things I've been thinking about, and would love to brainstorm with you about, include the following:

  1. How can we expose the multivariate functionality as a new ICEES+ endpoint, while maintaining all regulatory requirements and supporting user choice in, e.g., outcomes and feature variables? With a generic script and a warning to users about the limitations such as data loss, my gut feeling is that this should be relatively straightforward.
  2. How can we expose the multivariate functionality as part of Translator? See https://github.com/NCATSTranslator/OperationsAndWorkflows/issues/72. I think Translator folks are interested in this. In fact, it's been on the agenda for more than one meeting of the Ops & Workflows WG. Just not sure how to make this work. An alternative, I think, is to run prescribed regression analyses, for example, tailored to each use case, and expose the model results.
karafecho commented 11 months ago

Hong and I identified and implemented a solution to support open multivariate analysis using the ICEES+ OpenAPI. We also have identified and are implementing an approach to support open longitudinal multivariate analysis by exposing and leveraging two key feature variables: study_period and PatientID. The latter approach will be tested initially using the ICEES+ PCD instance.

Closing this issue as it has been replaced by #285 and #286.