OHDSI / circe-be

CIRCE is a cohort definition and syntax compiler tool for OMOP CDMv5
Apache License 2.0
9 stars 13 forks source link

fixed bug during cohort generation in netezza #181

Closed ssuvorov-fls closed 2 years ago

ssuvorov-fls commented 2 years ago

fixes https://github.com/OHDSI/WebAPI/issues/2124

chrisknoll commented 2 years ago

Hi, Everyone, Patrick recently pointed me to an alteration of the 'erify' logic which could be done by summing up the event_types (which are -1 for starts, +1 for ends) and when the sum of priors adds up to 0, you have an end. I think this is more performant than the current approach, and may avoid the inner-outer row_number() issue, so I'd like to investigate that. So let me hold onto this PR for a day or two, let me try to propose an alternative query for it, and maybe we can try it on netezza to see if it works.

chrisknoll commented 2 years ago

Ok, Here's the changes I'd like to ask you to add to this to perform the erify logic without need of rownumbers:

Origional:

    FROM
    (
      SELECT
        person_id
        , event_date
        , event_type
        , MAX(start_ordinal_inner) OVER (PARTITION BY person_id ORDER BY event_date, event_type, start_ordinal_inner ROWS UNBOUNDED PRECEDING) AS start_ordinal
        , ROW_NUMBER() OVER (PARTITION BY person_id ORDER BY event_date, event_type, start_ordinal_inner) AS overall_ord
      FROM
      (
        SELECT
          person_id
          , start_date AS event_date
          , -1 AS event_type
          , ROW_NUMBER() OVER (PARTITION BY person_id ORDER BY start_date) AS start_ordinal_inner
        FROM #cohort_rows

        UNION ALL

        SELECT
          person_id
          , DATEADD(day,@eraconstructorpad,end_date) as end_date
          , 1 AS event_type
          , NULL
        FROM #cohort_rows
      ) RAWDATA
    ) e
    WHERE (2 * e.start_ordinal) - e.overall_ord = 0

New version:

    FROM
    (
      SELECT
        person_id
        , event_date
        , SUM(event_type) OVER (PARTITION BY person_id ORDER BY event_date, event_type ROWS UNBOUNDED PRECEDING) AS interval_status
      FROM
      (
        SELECT
          person_id
          , start_date AS event_date
          , -1 AS event_type
        FROM #cohort_rows

        UNION ALL

        SELECT
          person_id
          , DATEADD(day,@eraconstructorpad,end_date) as end_date
          , 1 AS event_type
        FROM #cohort_rows
      ) RAWDATA
    ) e
    WHERE interval_status = 0

The change: we remove the two row_number() columns that were used to determine when end occurred (ie: WHERE (2 * e.start_ordinal) - e.overall_ord = 0). Instead, we SUM() the event_type together from the prior rows when ordering by the event_date, event_type (-1 will appear before +1). When we sum up the event types and get 0, that means every start has been closed by every end and therefore we have an 'end of era' date.

ssuvorov-fls commented 2 years ago

Hi, @chrisknoll It works fine, thanks for your suggestion