Open gowthamrao opened 3 years ago
Examples of metrics that may suggest if a cohort definitions implemented in 11th data source is similar to previous 10 datasources
At covariate/feature/attribute level
a composite score
distribution of differences Cohort Diagnostics computes standardize difference for any two combination We can calculate the standardized difference for each covariate for any two data source combinations. If we have 10 datasources that gives us !10 combination of standardized differences. For the 11th data sources, we compare its relative standardized difference to the differences as seen in other datasources.
Tagging @clairblacketer @jreps as this idea came from a recent discussion
Candidate Cohort level attributes:
Levels of diagnostics:
the cohort level determination is a more granular determination that helps determine if a cohort definition faithfully extracts the right cohort from the population in the data source. A common example is that because a data source has different data capture processes (coding practice) - it may not be yielding the right set of persons in the cohort (because of orphaned codes).
A practical use case is when within a specific network study more data sources may be incrementally added (new databases) or an old data source may have updated (updated version). The cohort definitions used in a study were NOT evaluated on these new/updated databases.
This would help us identify scenarios where:
The focus is on study cohort + database diagnostics with an intent to automate those to atleast an alerting system that flags issues for review - instead of having to parse thru all diagnostics for all cohorts each time there is a new/updated database.
The initial set of diagnostic rules are documented here
@gowthamrao I would like to consider this as an additional "tag" on a cohort in the phentype library but how we do this is difficult as currently diagnostics are subjectively interpreted based on output. E.g. we don't have a binary pass or fail based on some numeric value, like CohortMethod or SCCS, so inclusion here may be difficult.
In the presence of data source heterogeneity - can we empirically determine if a data source is fit for use of a phenotype or cohort definition. Datasource heterogeneity maybe due to differences in underlying population that contribute to the data source and because of data capture processes.
Data Source level determination - is a datasource fit for use for a study?
Researchers make determination on fit for use for their study based on judgment on whether a data source faithfully represents the underlying population being studied. First step aiding the determination is review of source data documentation (e.g. documentation provided by data partner) to answer questions like:
This is followed by understanding the source data capture processes that were used to populate the clinical experience of the population - is it accurate and complete.
OHDSI tools allows for further empirical determination: OHDSI has tools (Achilles characterization and Data Quality Dashboard) that present data source level characteristics. These tools provide empirical data that may be used in conjunction with data source documentation to make determination.
Cohort level determination - are cohort definition(s) being used in a study appropriate for the datasource? While data source level determination is a higher level decision making that is based on whether an underlying population is fit for use for the study, the cohort level determination is a more granular determination that helps determine if a cohort definition faithfully extracts the right cohort from the population in the data source. A common example is that because a data source has different data capture processes (coding practice) - it may not be yielding the right set of persons in the cohort (because of orphaned codes).
Cohort Diagnostics is a new tool that provides us diagnostics on cohorts instantiated on a data source. It enables both within and across data source comparison (i.e. a cohort instantiated one data source may be compared to cohort instantiated in another data source). Determinations may be made, based on observations on diagnostics, if a cohort definition as instantiated on one datasource is systematically different (and potentially different from expected) compared to other data sources.
Determination based on comparison to expected: Cohort diagnostics allows us to compare if attributes of a cohort as instantiated in a data source is comparable to expected. e.g. if we are building a cohort of persons who are pregnant, we have an apriori expectation that the age distribution to be in child bearing age. If we observed in the data source, that the age distribution is not in expected range - then it may indicate that the cohort definition as applied to a datasource may not be fit.
Determination based on comparison to other datasources: Similarly, it is possible to compare if one data source is similar or different to a set of benchmark datasources.
However, these determination are currently less systematic. The amount of data points is overwhelming. We lack a rubric/decision making rule that allows us to empirically accept/reject a datasource for a study. One approach to solve this is to build a 'empirical metric' for fitness of use - and use that metric. Approaches to compute such a metric includes 'similarity' metrics, distance metrics, or even setting an expected benchmark based on a starter set of datasources, and compare the new datasoruces to the benchmark.