OHDSI / Atlas

ATLAS is an open source software tool for researchers to conduct scientific analyses on standardized observational data
http://atlas-demo.ohdsi.org/
Apache License 2.0
273 stars 137 forks source link

Enhancement: Optionally generate cohort_details table with info about inclusion criteria and what triggered them #2891

Open TomWhite-MedStar opened 1 year ago

TomWhite-MedStar commented 1 year ago

Current State

Atlas can create very complex cohorts with multiple inclusion criteria. The generated reports can show attrition diagrams, or tables and graphics showing counts of patients who met each inclusion criteria.

However, the raw underlying data are not saved. It is not possible to export a table showing which inclusion criteria applied to each entry in the cohort. Or when and why those inclusion rules are triggered (e.g. the datetime, relevant concept, and relevant values that allowed it to meet inclusion criteria).

However, that level of detail is often needed for follow-up analysis.

For example, for the hospital harm measures mentioned in #2886 , the metrics define tight time criteria. For example:

  1. HH-Hypoglycemia - find cases where there is no follow-up glucose measurement within 5 minutes of critically low one (<= 40) showing that the glucose was now >= 80.
  2. HH-Opioid Reversal Adverse Event - find cases where Narcan is administered within 12 hours of an opioid.

As we profile these across our multi-hospital system, I want to find near misses, and understand where there is variability. For example, are there cases of glucose follow-up within 10, 15, etc. minutes? Or did the glucose not quite get up to 80 within those 5 minutes? Or what about cases where Narcan was needed within 18 hours.

Rather then creating many cohort variants, I'd like to make two. One which strictly meets the criteria, and another which casts a wider net. For the hypoglycemia measure, I might allow for f/u glucose within 8 hours. Then, once I generate that cohort, I'd like to review the frequency distribution of time from initial event (the critically low glucose level) to the inclusion event (when they got a follow-up), and what the follow-up glucose values were..

The way I have to do that today is to extract create custom SQL logic (often adapted from what is generated from OHDSI). This often requires multiple iterations as we test out different inclusion criteria and time ranges - so it is quite time consuming.

Desired Future State

Atlas already creates most of the needed staging tables as part of the generated SQL before dropping them (e.g. qualified_events, inclusion_0 -- inclusion_N, included_events, strategy_ends).

Theoretically, those tables could be (optionally) augmented to include fields per row to indicate which concept_id triggered the event (and when, via datetime), plus additional relevant attributes (like value, age, source_concept_id). Those standard attributes could be part of the inclusion_0 -- inclusion_N tables; and they could then be transposed so that each of those variables could be available (by inclusion number) in the included_events table.

In the end, what would help me most is a left join of the qualified_events with that new included_events table, including the separate timestamps for each (so that I know the timestamp of the qualifying event along with the timestamps of each inclusion event). That way I can build time to event values for inclusion metrics, plus profile the numeric values that allowed inclusions to happen. Then, by having two versions of such tables (one with exact criteria and second with looser criteria), I could better optics into those cases that nearly missed inclusion into the strict definition.

I know I can do this manually, but this is such a common need that I'd love to find a way to automate it, and take advantage of all the computation that Atlas is already doing along the way.

Detailed Request

  1. Add option on cohort design screen to generate analytic table as described below. By default, the table would be written to results.cohort_details_X, where X is the cohort ID.
  2. Have qualified_events temp table use date time (instead of just date), and include fields to indicate which concept_id triggered that event.
  3. Have each inclusion_N table follow similar pattern, so know the start and end datetime for each inclusion event, along with the concept_id that triggered it. These tables should include associated attributes based upon the cohort definition. For example, if inclusions are based upon a measurement value, the actual measurement value that triggered inclusion should be included. The field naming should be logical, and include the name of the field and the inclusion number for reference.
  4. At the stage of creating the included_events temp table, union the inclusion_N tables - transposing them so that each included_event for a given qualified event has variables for each of those attributes. For example, inclusion event details might be named start_time_N, value_N, value_as_concept_id_N, concept_id_N, etc. where N is the number of the inclusion event.
  5. Use similar approach for strategy_ends so know what triggers the end criteria and at what datetime.
  6. At the final stage of creating the final_cohort table, left join the qualified events table with the included_events table. The final table should have one row per qualified event. For each qualified event, you would know which (if any) inclusion events were applied, and the supporting details of why the inclusions were triggered. This should also be left joined to the strategy_ends table for the same reason.
  7. Write the full final table to results.cohort_details_X
  8. Write the typical subset to results.cohort as usual.
TomWhite-MedStar commented 1 year ago

With the above detail table as a foundation, users are likely to want certain calculations. For example:

It would be nice to have the option to specify optional variable names and the calculation so that those calculations could be part of the final details table.

Ultimately, it would nice to have more support for calculations directly within the cohort design (e.g. to look for a sequence of decreasing values, or to do calculations on whether drug dosing is adequate based upon the patients weight and the dosage administered); but I presume that adding such logic would be much more complicated.