OHDSI / Atlas

ATLAS is an open source software tool for researchers to conduct scientific analyses on standardized observational data
http://atlas-demo.ohdsi.org/
Apache License 2.0
273 stars 138 forks source link

optionally use datetime in cohort definitions to support electronic clinical quality measures #2767

Open TomWhite-MedStar opened 2 years ago

TomWhite-MedStar commented 2 years ago

Several electronic clinical quality measure (eCQM) definitions specify times between events in hours. For example:

Other than the requirements to use hours, many such measures can be completely specified as Atlas cohorts; so extending Atlas in this fashion could accelerate the OMOP-FHIR digital quality measures efforts.

Is possible to augment Atlas so that the date selection widgets are changed so that the word "days" is a button like "index end date", and clicking on it would toggle between "days" and "hours". image

Then, when generating the SQL, the system would use the appropriate datetime fields instead of date for the widgets where "hours" was selected.

I recognize that this might also require a modification of the JSON representation to use "Hours" instead of "Days" for those widgets.

This enhancement request is related to #1980, but I am hoping it could be incorporated into 2.13 release rather than waiting for V3.0.

TomWhite-MedStar commented 2 years ago

Proactively, I expect one concern would be that many OMOP instances do not use datetime, so use of cohorts that truly require datatime would not work on those instances.

This might be an opportunity to enhance the DataQualityDashboard to add two checks of whether the OMOP instance:

  1. DATETIME is non-NULL and can be used in lieu of DATE: check that CAST(field_DATETIME as DATE) = field_DATE
  2. DATETIME is populated with actual DATETIME values: CAST(field_DATETIME as DATE) = field_DATE AND field_DATETIME > field_DATE
TomWhite-MedStar commented 1 year ago

@alex-odysseus or @chrisknoll , can you provide a status update on this feature request?

chrisknoll commented 1 year ago

This is extremely complicated: we don't know the granularity of the data in question (ie: is it by month, by day, by hour, by minute, by second) and so there's challenges about how to reconcile differences in precision of the data captured between records.

Example: person's age can be at the level of year, or year-month, but rarely at year-month-day, so we currently assume year, and so we can only calculate age at an event by doing a year compariosn...this is just the known limitation of the tool.

Survey data is sometimes captured by the month, but typically all other data has at least the year-month-day. This context, we build criteria based on by-day time. (past 30d vs. past 30 minutes). I don't think we have any plans to make the change because I think it's a very niche case where we're dealing with hours/minutes/second when doing logitudinal studies, but as soon as it becomes a widespread problem, I think we'll have to re-think how we should handle this (as in: CDM source ETLs must define their events with an hour-minute-second value, which can default to 00:00:00 (midnight) so that we can then operate on the level of hours/minutes/seconds in cohort definitions.

TomWhite-MedStar commented 1 year ago

@chrisknoll , although I hear your concern, this capability would be high value for EHR data, and electronic clinical quality measures. Many (such as the ones listed above) expect data in datetime format, and do calculations based upon time between measurements, or time from admission to treatment, or time from treatment/measurement to discharge.

So, would it be technically feasible to make that sort of modification to Atlas, and then have SqlRender logic (like --{ through --}) to tailor existing WebAPI SQL to use the datetime field instead of the date field where appropriate?

If you can guide me, I can prototype some SQL changes, but I have no idea what is needed to modify the Atlas GUI.

chrisknoll commented 1 year ago

All this logic is contained in CIRCE, so it's not a WebAPI exercise; WebAPI calls out to circe to build the Sql. And this change feels like we're changing some fundamental assumptions of cohort definitions (ie: day granularity) so I'm very hesitant to suggest diving in on this without some more input from how the CDM group is going to tackle these issues (tagging @clairblacketer) , and also some guidance on the approach from @pbr6cornell .

But, if you want to take a look at the source for these queries, you can look at the following classes: CriteriaGroup: this contains the 'wrapper' that performs the AND/OR between the individual CorrelatedCriteria CorrelatedCriteria: This contains the fields that let you specify the Criteria, and the Window options (ie: starting between 365d prior and 1d prior to index). note this extends WindowedCriteria because there's a usecase to just find records relative to the date and not return a boolean if there were enough counts. Window: This is the class that stores EndPoint for the start and end of the window. Endpoints just store the days and how it is an offiset (- or +). The Window and Endpoint are where you need to be concerned. This is where the day-granularity is assumed. If we want to let the users make choices about if it's days, weeks minutes or seconds, this is where these choices are stored.

That is the data-model part of the puzzle, but the other is how the model is used to build the query.

The CohortExpressionQueryBuilder.getCriteriaGroupQuery(), is where it starts, and it calls out to getCorelatedCriteriaQuery() for each corelated criteria in the group. The getCorelatedCriteriaQuery is where we say 'with at least 1 ... between 365d before and 1d before index, and the getCriteriaGroupQuery is where we say '{corelatedQuery1} AND {corelatedQuery2} AND {corelatedQuery3}

I have no doubt that we can add properties to the model and generate the sql to handle those properties to do the right casting into date_time and using the appropriate date_add({unit},value) time function in SQL. But that's really 1% of the overall problem about how times are represented in the CDM, what additional steps will we need to handle about if we can even compare 2 records together when they don't have the same date precision, etc.