National-Clinical-Cohort-Collaborative / Synthetic-Clinical-Data

N3C Synthetic Clinical Data Workstream
3 stars 0 forks source link

Synthetic data sets for all data contributors #3

Open mgkahn opened 4 years ago

mgkahn commented 4 years ago

Posting here for discussion regarding expectation that ALL N3C data contributors will obtain a synthetic data set that reflects their COVID cohort submission. Current plans are for only selected institutions to receive synthetic data sets. I am seeking to set an expectation that N3C will deliver synthetic data sets to every N3C data partner in recognition of each partner's contribution to the data network.

mellybelly commented 4 years ago

Hi Michael, We expect that if the pilot to evaluate the generation of the synthetic data has a positive outcome, that we will indeed be able to generate synthetic data for the entirety of the LDS after it has gone through the data quality and harmonization pipe.

The pilot evaluation should include scientific/methodological comparisons against use of the real data for robustness, an understanding of variability in the data within and between sites and how this affects synthetic data production, evaluation and assurance of de-identification, and ideally a comparison of synthetic data generation methods, among other things.

I would kindly invite you to join the following workstream calls bring your ideas and requirements and help us make the pilot successful: 'Data Partnership & Governance' - community recommendations to NIH on the Data Sharing Agreement 'Collaborative Analytics -Clinical Scenarios subgroup' - designing analytical workflows that could be used as comparators and of course 'Synthetic Data' for helping to design the pilot. see bit.ly/n3c-join-instructions for details

mgkahn commented 4 years ago

Melissa: Your response points to a singular "entirety of the LDS" synthetic data set. I understand the value that will bring to the data network. My posting was setting expectations for site-specific synthetic data sets. Maybe I have misinterpreted earlier communications with Philip Payne. Are there plans for N3C to generate site-specific synthetic data sets in addition to an All-Network version?

mellybelly commented 4 years ago

Michael, If the pilot is successful we would anticipate being able to provide a variety of synthetic datasets creating according to different criteria, and this would include being able to return one to each contributing site. This would not be limited to specific sites.