chorus-ai / chorus-developer

Repository to define conventions for development and provide an overview of CHoRUS software and tooling
https://chorus-ai.github.io/chorus-developer/
0 stars 0 forks source link

Create matrix of project deliverables and the software packages/repos that support those deliverables #10

Open jshoughtaling opened 1 month ago

jshoughtaling commented 1 month ago

In an effort to eliminate ambiguity regarding development efforts and the deliverables they support, we will catalog and align deliverables with the packages and display them on this site.

jshoughtaling commented 1 month ago

@del42 - I created a collapsible table structure below to capture items per team and per major deliverable.

Let me know if you think this is a reasonable way to represent the data and I'll do it for the other teams and then publish it on the developer site.

Data Acquisition

Expand Aims
1. Site startup | SUBAIM | INVOLVED | SOFTWARE | DOCUMENTATION | | :--- | :--- | :--- | :--- | | 1.1\) Project Readiness | | | | | 1.2\) Ensure IRB readiness | | | |
2. Cohort Sampling and Size Justification | SUBAIM | INVOLVED | SOFTWARE | DOCUMENTATION | | :--- | :--- | :--- | :--- | | 2.1\) Identify intellectual property concerns at each site | | | | | 2.2\) Ensure diversity in site data | | | | | 2.3\) Obtain population- representative inferences | JARED | container-apps \[atlas\] | |
3. Federation process to access data from all patients | SUBAIM | INVOLVED | SOFTWARE | DOCUMENTATION | | :--- | :--- | :--- | :--- | | 3.1\) Ensure federated process to identify cohorts and request data | JARED | container-apps \[atlas\] | | | 3.2\) Document federated processes, with tooling module | | | | | 3.3\) Define and prepare meta dataset | JARED | container-apps \[atlas\] | |
4. Structured Data | SUBAIM | INVOLVED | SOFTWARE | DOCUMENTATION | | :--- | :--- | :--- | :--- | | 4.1\) Prepare OMOP Data | JARED | container-apps \[etl\] | Central Processing SOP | | 4.1\) Prepare to query OMOP data | JARED | container-apps \[etl\] | |
5. Obtaining High-resolution physiological data | SUBAIM | INVOLVED | SOFTWARE | DOCUMENTATION | | :--- | :--- | :--- | :--- | | 5.1\) Integrate OMOP vocabulary for physiological data | JARED | chorus-mapping \[vocab\] | SOP | | 5.2\) Standardize signal processing in waveforms | BRIAN | chorus\_waveform | | | 5.3\) Quality control | | | |
6. Collection and processing of image data | SUBAIM | INVOLVED | SOFTWARE | DOCUMENTATION | | :--- | :--- | :--- | :--- | | 6.1\) Ensure collection of image data at all sites | | | | | 6.2\) Ensure generation of metadata from images | | | | | 6.3\) Ensure collection of image data at all sites | | | | | 6.4\) Ensure linkage to other data domain | JARED | container-apps \[registry\] | MM SOP |
7. Collection and Processing of Clinical notes | SUBAIM | INVOLVED | SOFTWARE | DOCUMENTATION | | :--- | :--- | :--- | :--- | | 7.1\) Customize NLP algorithm | | | | | 7.2\) Validate the NLP tool using honest brokers | | | | | 7.3\) Unstructured EHR extraction | | | | | 7.4\) Implement NLP tool | | | |
8. Mining for socioeconomic status data | SUBAIM | INVOLVED | SOFTWARE | DOCUMENTATION | | :--- | :--- | :--- | :--- | | 8.1\) Identify common data SDoH elements from collective experience of sites | | | | | 8.2\) Identify and resolve discrepancies in performance of SDoH elements | | | |
9. Collection of contextual SDoH data | SUBAIM | INVOLVED | SOFTWARE | DOCUMENTATION | | :--- | :--- | :--- | :--- | | 9.1\) Review SDoH variables and data sources | | | | | 9.2\) Prepare geospatial crosswalk datasets | | | |
10. Linkage among EHR | SUBAIM | INVOLVED | SOFTWARE | DOCUMENTATION | | :--- | :--- | :--- | :--- | | 10.0\)Design and communicate linkage SOP | JARED | | MM SOP | | 10.1\) Retrieve EHR data through MRNs at each site | JARED | container-apps \[registry\] | MM SOP | | 10.2\) Minimal EHR dataset elements created and communicated to all sites | | | | | 10.3\) EHR data extracted | JARED | | | | 10.4\) Site-specific EHR extracts validated | | | | | 10.5\) Implement tools to accurately link EHR and physiologic data | JARED | container-apps \[registry\] | | | 10.6\) Gap analysis of linkage performed | JARED | | |
11. Deidentification of data | SUBAIM | INVOLVED | SOFTWARE | DOCUMENTATION | | :--- | :--- | :--- | :--- | | 11.1\) Prepare safe harboring approach | | | | | 11.2\) Apply SOP on safe harboring approach | | | | | 11.3\) Quality control of linked safe-harbor datasets | | | |
12. Event Annotation | SUBAIM | INVOLVED | SOFTWARE | DOCUMENTATION | | :--- | :--- | :--- | :--- | | 12.1\) Develop semi-automatic predictive monitoring model to identify events | | | | | 12.2\) ndividual chart review of potential events by clinicians | | | | | 12.3\) Develop phenotype algorithm in OHDSI Phenotype repository | JARED | container-apps \[atlas\] | | | 12.4\) Implement phenotype algorithm in OHDSI Phenotype repository | JARED | container-apps \[atlas\] | | | 12.5\) Generate silver-standard labels at scale automatically | JARED | | | | 12.6\) Implement annotation pipeline to combine physiological data for resolved timeline | | | | | 12.7\) Store results both locally and centrally | | | | | 12.8\) Create datasheet describing each cohort | JARED | container-apps \[atlas\], CHoRUSReports | |
13. Site-specific metadata | SUBAIM | INVOLVED | SOFTWARE | DOCUMENTATION | | :--- | :--- | :--- | :--- | | 13.1\) Encode de-identified hospital number on each data set | | | | | 13.2\) Report site-specific metadata from full patient dataset | | | |
14. Quality control | SUBAIM | INVOLVED | SOFTWARE | DOCUMENTATION | | :--- | :--- | :--- | :--- | | 14.1\) Write protocols for quality controls checks at all sites | JARED | CHoRUSReports | Quality Central | | 14.2\) Quantify variability when it exists | JARED | CHoRUSReports | Quality Central | | 14.3\) Use the Data Quality Dashboard from OHSDI to evaluate datasets | JARED | container-apps \[ares, www-dgs\] | Quality Central |
15. Privacy Check | SUBAIM | INVOLVED | SOFTWARE | DOCUMENTATION | | :--- | :--- | :--- | :--- | | 15.1\) Examine unique identifiability with combinations of attributes to check potential risk of linkability | | | | | 15.2\) Evaluate I-diversity and t-closeness | | | | | 15.3\) For unstructured data, extract concepts through whitelist mechanism and test for linkability | | | |
16. Generate synthetic data | SUBAIM | INVOLVED | SOFTWARE | DOCUMENTATION | | :--- | :--- | :--- | :--- | | 16.1\) Use generative adversarial network \(GAN\) approaches to produce site-specific synthetic data sets | | | | | 16.2\) Post the data and the source code | | | |
17. Hold-out validation dataset | SUBAIM | INVOLVED | SOFTWARE | DOCUMENTATION | | :--- | :--- | :--- | :--- | | 17.1\) Set aside 20% of patients for algorithm testing | | | | | 17.2\) Pursue qualification of reserved dataset for FDA's Medical Device Development Tool program | | | |
18. Data storage platform - CHoRUS | SUBAIM | INVOLVED | SOFTWARE | DOCUMENTATION | | :--- | :--- | :--- | :--- | | 18.1\) Organize files by year/month, then Subject ID, data modalities | JARED | container-apps \[etl\] | Central Processing SOP | | 18.2\) Store dataset in in site-specific staging area | JARED | container-apps \[etl\] | Central Processing SOP | | 18.3\) Implement automated data integrity check; review error logs | JARED | CHoRUSReports | Quality Central | | 18.4\) Copy files to data lake using secure FTP | JARED | container-apps \[etl\] | Central Processing SOP |