Fit for use diagnostics - is a cohort definition implemented in a data source fit for use

gowthamrao commented 3 years ago

In the presence of data source heterogeneity - can we empirically determine if a data source is fit for use of a phenotype or cohort definition. Datasource heterogeneity maybe due to differences in underlying population that contribute to the data source and because of data capture processes.

Data Source level determination - is a datasource fit for use for a study?

Researchers make determination on fit for use for their study based on judgment on whether a data source faithfully represents the underlying population being studied. First step aiding the determination is review of source data documentation (e.g. documentation provided by data partner) to answer questions like:

Is it based on a defined population (geographic area, type of population)
Does it have explicit/inferred observation period

This is followed by understanding the source data capture processes that were used to populate the clinical experience of the population - is it accurate and complete.

Does it capture care in inpatient and outpatient?
How about drug - classified into prescription/dispensation/directly observed administration/over the count non prescribed drugs
Does it have measurements (both performance and results value)

OHDSI tools allows for further empirical determination: OHDSI has tools (Achilles characterization and Data Quality Dashboard) that present data source level characteristics. These tools provide empirical data that may be used in conjunction with data source documentation to make determination.

Achilles provides several dashboards that describe the persons in the data source, their age at first occurrence, gender distribution, follow-up time distribution, frequency of concept codes from the different domains (conditions, procedure, drug, observations).
Data Quality Dashboard - performs a set of tests that are further categorized based on completeness, conformance and plausibility. Data Quality Dashboard also helps make a determination if the data, once transformed to OMOP CDM, has met quality standards of conformance, completeness and plausibility i.e. is research ready quality.

Cohort level determination - are cohort definition(s) being used in a study appropriate for the datasource? While data source level determination is a higher level decision making that is based on whether an underlying population is fit for use for the study, the cohort level determination is a more granular determination that helps determine if a cohort definition faithfully extracts the right cohort from the population in the data source. A common example is that because a data source has different data capture processes (coding practice) - it may not be yielding the right set of persons in the cohort (because of orphaned codes).

Cohort Diagnostics is a new tool that provides us diagnostics on cohorts instantiated on a data source. It enables both within and across data source comparison (i.e. a cohort instantiated one data source may be compared to cohort instantiated in another data source). Determinations may be made, based on observations on diagnostics, if a cohort definition as instantiated on one datasource is systematically different (and potentially different from expected) compared to other data sources.

Determination based on comparison to expected: Cohort diagnostics allows us to compare if attributes of a cohort as instantiated in a data source is comparable to expected. e.g. if we are building a cohort of persons who are pregnant, we have an apriori expectation that the age distribution to be in child bearing age. If we observed in the data source, that the age distribution is not in expected range - then it may indicate that the cohort definition as applied to a datasource may not be fit.

Determination based on comparison to other datasources: Similarly, it is possible to compare if one data source is similar or different to a set of benchmark datasources.

However, these determination are currently less systematic. The amount of data points is overwhelming. We lack a rubric/decision making rule that allows us to empirically accept/reject a datasource for a study. One approach to solve this is to build a 'empirical metric' for fitness of use - and use that metric. Approaches to compute such a metric includes 'similarity' metrics, distance metrics, or even setting an expected benchmark based on a starter set of datasources, and compare the new datasoruces to the benchmark.

gowthamrao commented 3 years ago

Examples of metrics that may suggest if a cohort definitions implemented in 11th data source is similar to previous 10 datasources

At covariate/feature/attribute level
1. The distribution of mean of age in 10 data source (imagine box plot is 28, 35, 45, 64, 70 - for 5th, 25th, 50th, 75th, 95th percentiles) - and the mean of age in 11th datasource is 16 years
2. The distribution of proportion of patients with Diabetes in 10 datasources are (11%, 13%, 14%, 15%, 16%) while that of the 11th data source is 65%.
a composite score
1. we could calculate a composite score based on above or some other similarity metric
distribution of differences Cohort Diagnostics computes standardize difference for any two combination We can calculate the standardized difference for each covariate for any two data source combinations. If we have 10 datasources that gives us !10 combination of standardized differences. For the 11th data sources, we compare its relative standardized difference to the differences as seen in other datasources.

gowthamrao commented 3 years ago

Tagging @clairblacketer @jreps as this idea came from a recent discussion

gowthamrao commented 3 years ago

Candidate Cohort level attributes:

Concept set: compare the difference in resolved/mapped/orphan concepts from one data source to another. Is there a data source that has resolved/mapped count that is different from the other data source. Is this acceptable?
Counts of concepts in concept set: relative distribution of counts of concepts in resolved/mapped/orphan - compare with other data sources
Index event breakdown: index event reports on the concepts that were potentially most likely to allow a subject to enter the cohort. Is that different?
Visit context: similar to above
Time series
Characterization

gowthamrao commented 3 years ago

Levels of diagnostics:

Fitness of data source - which can be a collection of studies
Fitness of study - which can be collection of cohort definitions
Fitness of cohort definitions - which is 1 cohort definition

gowthamrao commented 2 years ago

the cohort level determination is a more granular determination that helps determine if a cohort definition faithfully extracts the right cohort from the population in the data source. A common example is that because a data source has different data capture processes (coding practice) - it may not be yielding the right set of persons in the cohort (because of orphaned codes).

A practical use case is when within a specific network study more data sources may be incrementally added (new databases) or an old data source may have updated (updated version). The cohort definitions used in a study were NOT evaluated on these new/updated databases.

What would we do in situations like these?
Could we define a set of go / no go heuristics for cohort diagnostics, that can be used as a first best guess? This might be a series of acceptance checks that are run every time the underlying data changes - and we get an alert/report that has red/orange or green flags for each check.

This would help us identify scenarios where:

A databased was previously accepted, but now has a new version - which is not accepted.
New database id wants to participate in a study - but because of several red flags - is a candidate for rejection from participation in the study.

The focus is on study cohort + database diagnostics with an intent to automate those to atleast an alerting system that flags issues for review - instead of having to parse thru all diagnostics for all cohorts each time there is a new/updated database.

The initial set of diagnostic rules are documented here

azimov commented 1 month ago

@gowthamrao I would like to consider this as an additional "tag" on a cohort in the phentype library but how we do this is difficult as currently diagnostics are subjectively interpreted based on output. E.g. we don't have a binary pass or fail based on some numeric value, like CohortMethod or SCCS, so inclusion here may be difficult.

OHDSI / CohortDiagnostics

Fit for use diagnostics - is a cohort definition implemented in a data source fit for use #480