National-Clinical-Cohort-Collaborative / Data-Ingestion-and-Harmonization

Data Ingestion and Harmonization
42 stars 12 forks source link

Management and communication of CDM mapping heuristics #17

Open DaveraGabriel opened 4 years ago

DaveraGabriel commented 4 years ago

There remains outstanding decisions re: the management of missing information and misalignment of heuristics when mapping to OMOP model. Mapping requires decisions are made which may impact the availability of or semantics of data in the set. The downstream Analytics users will likely have a differing set of assumptions with regard to the data they are utilizing. OMOP has documentation which outlines definitions and implementation decisions. For cases where the N3C implementation will differ from these guidelines - a process for determining and communicating rules / principles for data heuristics which service the needs of the ends users is needed

DaveraGabriel commented 4 years ago

Issues requiring exploration . community input include: Missing data (imputation?) For example replace missing drug end date with start date? Calculate imputed lab values for missing lab values? Units of measurement normalization (eg: spelling errors…) Multiselect of values in demographics Gender, Race, Ethnicity…

hlehmann17 commented 4 years ago

There's a more abstract way of posing these questions:

  1. Are there any “gotta haves” in the process to ensure that the analysts get the data they need? a. Related, any idea on how to motivate the analysts to engage in the more upstream processes?
  2. Is there any meta data analyts wish they could  get from the sites to help them interpret some of their data?

The last point seems especially relevant in the covid research effort----I know we are including “flags” for covid infection, but the you should probably know what each site’s set of criteria are for setting that flag. (FHIR has a new standard, Group, that could be used to describe a computable phenotype more intentionally than by value sets. We’re using it in the new Evidence/Evidence Variable [element: Characteristic/definition/definition Reference]  Resources.)


DaveraGabriel commented 4 years ago

see also: [Develop an N3C mapping metadata schema to support downstream reproducibility and analyses #37] and Linking DI&H to Analytics #22 (https://github.com/National-COVID-Cohort-Collaborative/Data-Ingestion-and-Harmonization/issues/37) These seem to be the same issue