Open DaveraGabriel opened 4 years ago
Issues requiring exploration . community input include: Missing data (imputation?) For example replace missing drug end date with start date? Calculate imputed lab values for missing lab values? Units of measurement normalization (eg: spelling errors…) Multiselect of values in demographics Gender, Race, Ethnicity…
There's a more abstract way of posing these questions:
The last point seems especially relevant in the covid research effort----I know we are including “flags” for covid infection, but the you should probably know what each site’s set of criteria are for setting that flag. (FHIR has a new standard, Group, that could be used to describe a computable phenotype more intentionally than by value sets. We’re using it in the new Evidence/Evidence Variable [element: Characteristic/definition/definition Reference] Resources.)
see also: [Develop an N3C mapping metadata schema to support downstream reproducibility and analyses #37] and Linking DI&H to Analytics #22 (https://github.com/National-COVID-Cohort-Collaborative/Data-Ingestion-and-Harmonization/issues/37) These seem to be the same issue
There remains outstanding decisions re: the management of missing information and misalignment of heuristics when mapping to OMOP model. Mapping requires decisions are made which may impact the availability of or semantics of data in the set. The downstream Analytics users will likely have a differing set of assumptions with regard to the data they are utilizing. OMOP has documentation which outlines definitions and implementation decisions. For cases where the N3C implementation will differ from these guidelines - a process for determining and communicating rules / principles for data heuristics which service the needs of the ends users is needed