Open DaveraGabriel opened 4 years ago
Agreed, that is a good example of harmonization. I could be convinced to consider a 4th column of HPO reductions. What I had in mind was intermediate, where we preserve the scalar value, rather than a binary variable, but simply put like tests in the same box. Chris
Hi Chris, as you know, this is the whole point of the loinc2hpo library (https://www.nature.com/articles/s41746-019-0110-4). We are now in the process of implementing a Python version that will run on the Palantir site. We could use some help with the planned interface between our system and the Palantir system and would appreciate help/advice. -Peter
Thoughtful comments, Andrew. I agree multidisciplinary teams are what we want to encourage.
I think there is a compromise here. For the most common and important clinical variables, maybe on the order of a few score, I favor creating and forwarding less granular mappings as a third set of columns. As a clinician, I know that values associated with different methodologies, say for serum sodium, are collapsed in graphs and reports in the EHR, since the different methodologies are entirely irrelevant to the clinical interpretation of the values. This is also true for many COVID critical variables such as blood creatinines.
Chris
We concluded our mtg yesterday with a consensus in favor of the utility of an extra column with a preferred mapping for concepts, that might be at a less granular level than the mapped standard concept used in the source data. The thought was this might be another aid to the complex phenotyping work required for model development. A useful default.
I agreed and still think that this is likely to make it easier for researchers to do their work. And I agree that ease of use is important.
The consequences of making the easy thing to do the wrong thing to do are worth considering.
As this paper shows, the difference between features used to develop vs those used to externally validate models is an important source of bias that can seriously decrease predictive model’s value. https://onlinelibrary.wiley.com/doi/pdf/10.1002/sim.8183
Their conclusion is one I agree with and one that clearly reinforces the value of the tool Andrew Girven showed: “prediction models should be derived from and validated on datasets collected with measurement procedures that are in widespread use in the intended clinical setting.”
In N3Cs case the development of a model that reflects measurement practices and procedure Versions which are common at my site but uncommon at yours may be “wrong” model development practice we want to discourage. Use of an easy mapped concept option that obscures the distinction between my sites meausrements and yours will discourage a consideration of this potentially very important aspect of phenotype development.
I am not argueing against the extra column with the easy mapping option. But I think it will be good push people to consider the consequences of using it since they may be important.
I hope this type of communication is a useful form of dialog as we try to devise guardrails that strike the right balance between ease of use and best performance and most valid evidence.
I wonder if our thinking about this will best be framed by a research team with the required skills to do the research rather than a hypothetical lone clinician researcher who would likely feel daunted at the task of learning and navigating vocbularies.
We wouldn’t encourage that researcher to do their study without appropriate guidance on Statistical methods. Or if it involved genomics, to do it without the appropriate guidance from someone with that expertise. So it might stand to reason that an informatics person who can help guide the appropriate use of clinical data is similarly to be expected.
Andrew
From: jhu-informatics-team@googlegroups.com jhu-informatics-team@googlegroups.com On Behalf Of Christopher Chute Sent: Tuesday, June 23, 2020 11:49 AM To: jhu-informatics-team@googlegroups.com Subject: Case status
On our list of things to do, just go an addition. I am writing my recollection of my entire list, could be added to Github. Case status is the new one.
Chris
I guess I'll add that we never really obtained full community agreement on the categories in the phenotype--one woman's "suspected" is another woman's "possible," for example. I agree with Kristin's suggestion to represent these labels in shared cohort definitions rather than persisting them in the database.
Fair enough on the case status categories, I had thought there was more consensus. I do believe that pre-computing some lab parents (all those blood creatinine orphans) would be useful, particularly for the elements used in the characterization paper.
The N3C project requirements and DI&H processes require data outside a standard OMOP implementation. Primarily this information are data that are created or used by N3C processes in addition tot he source data. Additional data base fields (columns) or tables planned to be included, but are not limited to the following information:
1) Concept / code grouping parent identifiers added as a usability enhancement to the data store. These are "roll-up" codes which provide parent or less specific concepts as coded in a CDM to group data into substantially similar concept groups which associate the data to concepts more amenable to researcher / end user use. For example - many LOINC or NDC codes can be grouped into a more generalizable parent concepts, having little impact on computation.
"Peripheral oxygen saturation (SpO2)/fraction of inspired oxygen (FiO2) [ratio], This ratio is currently not part of LOINC or SNOMED. If it is added to one of those terminologies in the future, this guidance will be updated on how to create an appropriate entry in OBS_CLIN."