OHDSI / SqlRender

This is an R package and Java library for rendering parameterized SQL, and translating it to different SQL dialects.
https://ohdsi.github.io/SqlRender
Other
80 stars 77 forks source link

Feature request: Add support for datetime calculations #345

Closed TomWhite-MedStar closed 5 months ago

TomWhite-MedStar commented 11 months ago

Multiple US electronic Clinical Quality Measure (eCQM) are requiring datetime calculations. I've successfully prototyped three of them in Atlas, and then modified the generated SQL to use datetime logic on Databricks.

If we get agreement from the Atlas and WebApi teams to implement enhancements on their ends, what is the feasibility of adding datetime logic conversions to SqlRender?

I took logic that works for Databricks Spark, and ask GPT-4 to translated it to the other supported languages. Other than Impala and Hive, GPT-4 provided a direct translation that looks for Impala and Hive, it suggested a work-around).

Here is the sample query in Spark SQL:

CREATE OR REPLACE TEMPORARY VIEW new_datetime_values AS
WITH cte as (
  SELECT cast(to_date('2022-12-31','yyyy-MM-dd') as timestamp) as start_datetime
)
SELECT
  start_datetime,
  start_datetime + INTERVAL 15 SECONDS AS plus_15_seconds,
  start_datetime + INTERVAL 10 MINUTES AS plus_10_minutes,
  start_datetime + INTERVAL 5 HOURS AS plus_5_hours,
  start_datetime + INTERVAL 2 DAYS AS plus_2_days
FROM 
  cte;

SELECT
  start_datetime,
  plus_15_seconds,
  UNIX_TIMESTAMP(plus_15_seconds) - UNIX_TIMESTAMP(start_datetime) AS diff_in_seconds, -- Direct difference in seconds
  plus_10_minutes,
  (UNIX_TIMESTAMP(plus_10_minutes) - UNIX_TIMESTAMP(start_datetime)) / 60 AS diff_in_minutes, -- Convert seconds to minutes
  plus_5_hours,
  (UNIX_TIMESTAMP(plus_5_hours) - UNIX_TIMESTAMP(start_datetime)) / 3600 AS diff_in_hours, -- Convert seconds to hours
  plus_2_days,
  (UNIX_TIMESTAMP(plus_2_days) - UNIX_TIMESTAMP(start_datetime)) / (3600 * 24) AS diff_in_days -- Convert seconds to days
FROM 
  new_datetime_values;

And here is comparable code in native MS SQL syntax:

CREATE  TABLE #datetimes (start_datetime DATETIME);
INSERT INTO #datetimes (start_datetime) VALUES ('2022-12-31');

WITH cte as (
SELECT
  start_datetime,
  DATEADD(SECOND, 15, start_datetime) AS plus_15_seconds,
  DATEADD(MINUTE, 10, start_datetime) AS plus_10_minutes,
  DATEADD(HOUR, 5, start_datetime) AS plus_5_hours,
  DATEADD(DAY, 2, start_datetime) AS plus_2_days
FROM #datetimes
)
SELECT
  start_datetime,
  plus_15_seconds,
  DATEDIFF(SECOND, start_datetime, plus_15_seconds) AS diff_in_seconds,
  plus_10_minutes,
  DATEDIFF(MINUTE, start_datetime, plus_10_minutes) AS diff_in_minutes,
  plus_5_hours,
  DATEDIFF(HOUR, start_datetime, plus_5_hours) AS diff_in_hours,
  plus_2_days,
  DATEDIFF(DAY, start_datetime, plus_2_days) AS diff_in_days
FROM 
  cte;
schuemie commented 11 months ago

Yes, sounds like a great idea. As always, the idea would be to use SQL Server SQL as the starting point, and translate to the various dialects from there.

It will be quite a lot of work though, given the fact we have to support all these platforms. This might be something to work on during the OHDSI Symposium hackathon.

TomWhite-MedStar commented 11 months ago

I'd be interested in helping at the Symposium hackathon. Where can I find developer documentation or guidance? Additionally, is there a recommended IDE if custom Java development is needed?

I presume someone would start by testing whether this can be implemented via a new set of replacement patterns (and write test cases for them). And if that isn't adequate, custom Java code might be needed.

schuemie commented 11 months ago

There are generic guidelines here and a style guide here.

The translation rules mechanism is pretty versatile. I would be surprised if we needed custom Java code for this. There's no preferred editor for Java (I use Eclipse out of habit).

I would probably attack this issue like this:

  1. Figure out correct syntax in each DBMS. (probably test them on the platforms themselves just to make sure)
  2. Write unit tests for the translations
  3. Add translation patterns until unit tests pass.

This line of code is your friend, as it allows you reload the translation rules without having to rebuild or reload the package.

TomWhite-MedStar commented 11 months ago

@schuemie , thanks.

Turns out Spark now supports nearly the same syntax as used in MS SQL, so no translation to using UNIX_TIMESTAMP() is needed. The only need is to swap start end end dates. MS SQL uses datediff(unit, startdate, enddate). Spark uses datediff(unit, enddate, startdate)

CREATE OR REPLACE TEMPORARY VIEW new_datetime_values AS
WITH cte as (
  SELECT cast(to_date('2022-12-31','yyyy-MM-dd') as timestamp) as start_datetime
)
SELECT
  start_datetime,
  DATEADD(SECOND, 15, start_datetime) AS plus_15_seconds,
  DATEADD(MINUTE, 10, start_datetime) AS plus_10_minutes,
  DATEADD(HOUR, 5, start_datetime) AS plus_5_hours,
  DATEADD(DAY, 2, start_datetime) AS plus_2_days
FROM 
  cte;

SELECT
  start_datetime,
  plus_15_seconds,
  DATEDIFF(SECOND, start_datetime, plus_15_seconds) AS diff_in_seconds, 
  plus_10_minutes,
  DATEDIFF(MINUTE, start_datetime, plus_10_minutes) AS diff_in_minutes,
  plus_5_hours,
  DATEDIFF(HOUR, start_datetime, plus_5_hours) AS diff_in_hours, 
  plus_2_days,
  DATEDIFF(DAY, start_datetime, plus_2_days) AS diff_in_days
FROM 
  new_datetime_values;
TomWhite-MedStar commented 11 months ago

@schuemie , thanks for the guidance. Turns out SqlRender already supported DATEADD() of non-day units. I added support for DATEDIFF() of non-day units (for Spark), and submitted a pull request.