OHDSI / Atlas

ATLAS is an open source software tool for researchers to conduct scientific analyses on standardized observational data
http://atlas-demo.ohdsi.org/
Apache License 2.0
273 stars 137 forks source link

Atlas Cohort issue - cohort with certain procedure in the same visit with a certain diagnosis #454

Closed gowthamrao closed 6 years ago

gowthamrao commented 7 years ago

Expected behavior

Be able to create a cohort that had a Percutaneous coronary intervention (procedure) in the same visit where they also had an inpatient visit with a primary diagnosis non-st myocardial infarction

Actual behavior

Be able to create a cohort that had a Percutaneous coronary intervention (procedure) x-days before and x-days after an inpatient visit with a primary diagnosis of non-st myocardial infarction

Steps to reproduce behavior

http://www.ohdsi.org/web/atlas/#/cohortdefinition/922794

condition and procedure must have the same visit_occurrence_id. visit must be inpatient visit.

gowthamrao commented 7 years ago

@chrisknoll is current Atlas able to do 'expected behavior' right now?

chrisknoll commented 7 years ago

@gowthamrao : no, it's based on the temporal proximity (as you describe in Actual Results). We could tag the cohort events that are selected with the visit_id that the event came from, and leverage that when looking for the correlated events (at least 1 of X during the same visit...). My concern here is that visit standardization is sort of a problem, as you know from the discussions about changing the visit structure in the CDM. It's more consistent when talking across a network of CDMs to say that you're looking at a time context (time is fairly well standardized across all environments :) ). Everyone agrees that when you say something happened within X days, we're all talking about the same time window.

However, some cases have many different visits within the same hospital due to moving between departments while others consider the stay in the same location the visit. Since the meaning of a visit can vary, you can't really say consistently 'received procedure at the same visit as the diagnosis'.

gowthamrao commented 7 years ago

Correct -- the distinction is the functionality between nested query vs additional event criteria. Both appear to achieve similar functionality.

Could nested query be made to imply -- nested within the same visit record (i.e. join by visit-id) vs 'additional event criteria' to be the temporal?

chrisknoll commented 7 years ago

Could nested query be made to imply -- nested within the same visit record (i.e. join by visit-id) vs 'additional event criteria' to be the temporal?

No, the nested query is just to establish a new index to anchor off for the nested correlated criteria...So that you can do something like:

Had a diagnosis-> Which had a procedure within 30d after diagnosis Which had a follow up visit within 30d after procedure

In cases I've used this, it was explicitly to look beyond the current visit, often finding things that occur at a subsequent visit. Looking backwards in time has the same effect.

If the need for looking for things that occur during the same visit, then I think the way forward is to just add the visit_id to the events and support comparing visit_ids to each other if the user wants that, they can tick a UI checkbox to restrict to same visit.

gowthamrao commented 7 years ago

If the need for looking for things that occur during the same visit, then I think the way forward is to just add the visit_id to the events and support comparing visit_ids to each other if the user wants that, they can tick a UI checkbox to restrict to same visit.

Yes - I think that would work, let me think thru that. How would we handle a table like our COST table -- that has a cost_event_id = visit_occurrence_id where cost_domain_id = 'Visit'

chrisknoll commented 7 years ago

Not sure, I hadn't thought about how to define a cohort based on costs of things...my thoughts about employing a cost table would be at analysis time where you have a cohort of people (let's say, presence in an inpatient setting' and then the analysis would be to sum up the costs that existed during that stay...or if you wanted to know what the costs were to a person while undergoing a set of procedures, I'd create a cohort where the person is subject to the procedures (and thanks to cohort era-fy, we can chain procedures together) and once we have those time windows, we can see what costs they were subjected to during the duration...my point is that I hadn't really thought about including people in a cohort based on expenses. I'd have to think about that as well.

gowthamrao commented 7 years ago

I think the way forward is to just add the visit_id to the events and support comparing visit_ids to each other if the user wants that, they can tick a UI checkbox to restrict to same visit.

That means using visit_occurrence_id in the join when the user selects the checkbox to restrict to same visit. We need some help understanding where and how that would work. If we take this example http://www.ohdsi.org/web/atlas/#/cohortdefinition/922810 image

the three criteria queries are a UNION of each other, i.e. either of them should be present

CREATE TEMP TABLE qualified_events

AS
WITH primary_events (event_id, person_id, start_date, end_date, op_start_date, op_end_date)  AS (
-- Begin Primary Events
select row_number() over (PARTITION BY P.person_id order by P.start_date) as event_id, P.person_id, P.start_date, P.end_date, OP.observation_period_start_date as op_start_date, OP.observation_period_end_date as op_end_date
FROM
(
  select P.person_id, P.start_date, P.end_date, ROW_NUMBER() OVER (PARTITION BY person_id ORDER BY start_date ASC) ordinal
  FROM 
  (
  -- Begin Visit Occurrence Criteria
select C.person_id, C.visit_occurrence_id as event_id, C.visit_start_date as start_date, C.visit_end_date as end_date, C.visit_concept_id as TARGET_CONCEPT_ID
from 
(
  select vo.*, ROW_NUMBER() over (PARTITION BY vo.person_id ORDER BY vo.visit_start_date, vo.visit_occurrence_id) as ordinal
  FROM @cdm_database_schema.VISIT_OCCURRENCE vo
where vo.visit_concept_id in (SELECT concept_id from  Codesets where codeset_id = 2)
) C

-- End Visit Occurrence Criteria

UNION ALL
-- Begin Procedure Occurrence Criteria
select C.person_id, C.procedure_occurrence_id as event_id, C.procedure_date as start_date, (C.procedure_date + 1*INTERVAL'1 day') as END_DATE, C.procedure_concept_id as TARGET_CONCEPT_ID
from 
(
  select po.*, ROW_NUMBER() over (PARTITION BY po.person_id ORDER BY po.procedure_date, po.procedure_occurrence_id) as ordinal
  FROM @cdm_database_schema.PROCEDURE_OCCURRENCE po
where po.procedure_concept_id in (SELECT concept_id from  Codesets where codeset_id = 1)
) C

-- End Procedure Occurrence Criteria

UNION ALL
-- Begin Condition Occurrence Criteria
SELECT C.person_id, C.condition_occurrence_id as event_id, C.condition_start_date as start_date, COALESCE(C.condition_end_date, (C.condition_start_date + 1*INTERVAL'1 day')) as end_date, C.CONDITION_CONCEPT_ID as TARGET_CONCEPT_ID
FROM 
(
  SELECT co.*, ROW_NUMBER() over (PARTITION BY co.person_id ORDER BY co.condition_start_date, co.condition_occurrence_id) as ordinal
  FROM @cdm_database_schema.CONDITION_OCCURRENCE co
  where co.condition_concept_id in (SELECT concept_id from  Codesets where codeset_id = 0)
) C

WHERE C.condition_type_concept_id in (45756900,45756835)
-- End Condition Occurrence Criteria

  ) P
) P
JOIN @cdm_database_schema.observation_period OP on P.person_id = OP.person_id and P.start_date >=  OP.observation_period_start_date and P.start_date <= op.observation_period_end_date
WHERE (OP.OBSERVATION_PERIOD_START_DATE + 0*INTERVAL'1 day') <= P.START_DATE AND (P.START_DATE + 0*INTERVAL'1 day') <= OP.OBSERVATION_PERIOD_END_DATE AND P.ordinal = 1
-- End Primary Events

)
 SELECT
event_id, person_id, start_date, end_date, op_start_date, op_end_date

FROM
(
  select pe.event_id, pe.person_id, pe.start_date, pe.end_date, pe.op_start_date, pe.op_end_date, row_number() over (partition by pe.person_id order by pe.start_date ASC) as ordinal
  FROM primary_events pe

) QE

;

if the user selects in the UI the option to restrict to the same visit_id - then the query should be modified to

CREATE TEMP TABLE qualified_events

AS
WITH primary_events (event_id, person_id, start_date, end_date, op_start_date, op_end_date)  AS (
-- Begin Primary Events
select row_number() over (PARTITION BY P.person_id order by P.start_date) as event_id, P.person_id, P.start_date, P.end_date, OP.observation_period_start_date as op_start_date, OP.observation_period_end_date as op_end_date
FROM
(
  select P.person_id, P.start_date, P.end_date, ROW_NUMBER() OVER (PARTITION BY person_id ORDER BY start_date ASC) ordinal
  FROM  
  (

(
  -- Begin Visit Occurrence Criteria
select C.person_id, C.visit_occurrence_id as event_id, C.visit_start_date as start_date, C.visit_end_date as end_date, C.visit_concept_id as TARGET_CONCEPT_ID, C.visit_occurrence_id as visit_id
from 
(
  select vo.*, ROW_NUMBER() over (PARTITION BY vo.person_id ORDER BY vo.visit_start_date, vo.visit_occurrence_id) as ordinal
  FROM @cdm_database_schema.VISIT_OCCURRENCE vo
where vo.visit_concept_id in (SELECT concept_id from  Codesets where codeset_id = 2)
) C

-- End Visit Occurrence Criteria
) C1

JOIN

(
-- Begin Procedure Occurrence Criteria
select C.person_id, C.procedure_occurrence_id as event_id, C.procedure_date as start_date, (C.procedure_date + 1*INTERVAL'1 day') as END_DATE, C.procedure_concept_id as TARGET_CONCEPT_ID, C.visit_occurrence_id as visit_id
from 
(
  select po.*, ROW_NUMBER() over (PARTITION BY po.person_id ORDER BY po.procedure_date, po.procedure_occurrence_id) as ordinal
  FROM @cdm_database_schema.PROCEDURE_OCCURRENCE po
where po.procedure_concept_id in (SELECT concept_id from  Codesets where codeset_id = 1)
) C

-- End Procedure Occurrence Criteria
) C2
on C1.person_id = C2.person_id and C1.visit_id = C2.visit_id

JOIN
(
-- Begin Condition Occurrence Criteria
SELECT C.person_id, C.condition_occurrence_id as event_id, C.condition_start_date as start_date, COALESCE(C.condition_end_date, (C.condition_start_date + 1*INTERVAL'1 day')) as end_date, C.CONDITION_CONCEPT_ID as TARGET_CONCEPT_ID, C.visit_occurrence_id as visit_id
FROM 
(
  SELECT co.*, ROW_NUMBER() over (PARTITION BY co.person_id ORDER BY co.condition_start_date, co.condition_occurrence_id) as ordinal
  FROM @cdm_database_schema.CONDITION_OCCURRENCE co
  where co.condition_concept_id in (SELECT concept_id from  Codesets where codeset_id = 0)
) C

WHERE C.condition_type_concept_id in (45756900,45756835)
-- End Condition Occurrence Criteria
) C3
on C1.person_id = C3.person_id and C1.visit_id = C3.visit_id
  ) P  
) P
JOIN @cdm_database_schema.observation_period OP on P.person_id = OP.person_id and P.start_date >=  OP.observation_period_start_date and P.start_date <= op.observation_period_end_date
WHERE (OP.OBSERVATION_PERIOD_START_DATE + 0*INTERVAL'1 day') <= P.START_DATE AND (P.START_DATE + 0*INTERVAL'1 day') <= OP.OBSERVATION_PERIOD_END_DATE AND P.ordinal = 1
-- End Primary Events

)
 SELECT
event_id, person_id, start_date, end_date, op_start_date, op_end_date

FROM
(
  select pe.event_id, pe.person_id, pe.start_date, pe.end_date, pe.op_start_date, pe.op_end_date, row_number() over (partition by pe.person_id order by pe.start_date ASC) as ordinal
  FROM primary_events pe

) QE

;

is this a correct assessment @chrisknoll ?

The changes are

chrisknoll commented 7 years ago

Hi, @gowthamrao . Nice write up, it's definitely true we want to add the visit_occurence_id to the select list. But the rest of what you propose isn't quite right. Let me take a step back:

Remember that a cohort definition is about finding people that match criteria for a specific time window. The way you are framing the use of the 'primary events' is 'Find me the people that have Visit, Procedure, Condition in the same visit'. That's not a time window, that's more of a 'case'.

Primary events are used to select the start dates that can indicate when a person enters the cohort. The reason why Primary events UNION by default is that you can say something like 'Starting with either a procdure X or diagnosis Y or treatment Z....' You can do more complex rules around each of those primary events using nested criteria, but let's keep it simple for now.

Notice that it wouldn't make sense to say 'Starting with procedure x and diagnosis Y and treatment Z'...which one is the start? The AND makes it look like you want to use all the start dates together somehow. So, that's not how primary events work..primary events are always a selection of events ie: an UNION.

This is how I think you'd want to frame your example:

Starting with diagnosis X (the primary event) Having at least 1: A procedure occurrence Y starting within all days before and all days after diagnosis within the same visit.

This will start the person in the cohort on their diagnosis date, if they also have a procedure Y in the same visit. Person will leave cohort based on the cohort exit strategy and censoring events you specify later.

So the key to looking for correlated events is to look at how the query is built to perform the correlated query. Go back to your example, and this time just pick a single primary event (the condition, just to make it simple), and then click the 'Add qualifying criteria' button. This will reveal the correlated query UI to specify that you're looking for an associated event relative to the primary event. The current state only let's you look for the procedure within a timeframe. What you want to add is to optionally specify that the event should be restricted within the same visit as the primary event.

After you add the qualifying event, look at the generated sql, and from there try to determine the changes that need to be made to the query.

gowthamrao commented 7 years ago

Thank you for explaining these points

cohort definition is about finding people that match criteria for a specific time window.

'Find me the people that have Visit, Procedure, Condition in the same visit'. That's not a time window, that's more of a 'case'.

Primary events are used to select the start dates that can indicate when a person enters the cohort.

primary events are always a selection of events ie: an UNION.

Following your advise

Starting with diagnosis X (the primary event) Having at least 1: A procedure occurrence Y starting within all days before and all days after diagnosis within the same visit.

I think we have two options:

  1. Using the 'Add qualifying criteria' button.
  2. Using Nested criteria within primary event.
gowthamrao commented 7 years ago

Option 1: Using the 'Add qualifying criteria' button.

The first approach is http://www.ohdsi.org/web/atlas/#/cohortdefinition/922831 image

The generated SQL has three main sections

1. qualified_events


The output is event_id, person_id, start_date, end_date, op_start_date, op_end_date

(original SQL)

CREATE TEMP TABLE qualified_events

AS
WITH primary_events (event_id, person_id, start_date, end_date, op_start_date, op_end_date)  AS (
-- Begin Primary Events
select row_number() over (PARTITION BY P.person_id order by P.start_date) as event_id, P.person_id, P.start_date, P.end_date, OP.observation_period_start_date as op_start_date, OP.observation_period_end_date as op_end_date
FROM
(
  select P.person_id, P.start_date, P.end_date, ROW_NUMBER() OVER (PARTITION BY person_id ORDER BY start_date ASC) ordinal
  FROM 
  (
  -- Begin Condition Occurrence Criteria
SELECT C.person_id, C.condition_occurrence_id as event_id, C.condition_start_date as start_date, COALESCE(C.condition_end_date, (C.condition_start_date + 1*INTERVAL'1 day')) as end_date, C.CONDITION_CONCEPT_ID as TARGET_CONCEPT_ID
FROM 
(
  SELECT co.*, ROW_NUMBER() over (PARTITION BY co.person_id ORDER BY co.condition_start_date, co.condition_occurrence_id) as ordinal
  FROM @cdm_database_schema.CONDITION_OCCURRENCE co
  where co.condition_concept_id in (SELECT concept_id from  Codesets where codeset_id = 0)
) C

-- End Condition Occurrence Criteria

  ) P
) P
JOIN @cdm_database_schema.observation_period OP on P.person_id = OP.person_id and P.start_date >=  OP.observation_period_start_date and P.start_date <= op.observation_period_end_date
WHERE (OP.OBSERVATION_PERIOD_START_DATE + 0*INTERVAL'1 day') <= P.START_DATE AND (P.START_DATE + 0*INTERVAL'1 day') <= OP.OBSERVATION_PERIOD_END_DATE
-- End Primary Events

)
 SELECT
event_id, person_id, start_date, end_date, op_start_date, op_end_date

FROM
(
  select pe.event_id, pe.person_id, pe.start_date, pe.end_date, pe.op_start_date, pe.op_end_date, row_number() over (partition by pe.person_id order by pe.start_date ASC) as ordinal
  FROM primary_events pe

) QE

;

Other considerations/thoughts for qualified_events:

2. inclusionRuleCohorts


The output is inclusion_rule_id, person_id, event_id

(original SQL)

CREATE TEMP TABLE inclusionRuleCohorts 
 (inclusion_rule_id bigint,
  person_id bigint,
  event_id bigint
)
;
INSERT INTO inclusionRuleCohorts (inclusion_rule_id, person_id, event_id)
select 0 as inclusion_rule_id, person_id, event_id
FROM 
(
  select pe.person_id, pe.event_id
  FROM qualified_events pe

JOIN (
-- Begin Criteria Group
select 0 as index_id, person_id, event_id
FROM
(
  select E.person_id, E.event_id 
  FROM qualified_events E
  LEFT JOIN
  (
    -- Begin Correlated Criteria
SELECT 0 as index_id, p.person_id, p.event_id
FROM qualified_events P
LEFT JOIN
(
  -- Begin Procedure Occurrence Criteria
select C.person_id, C.procedure_occurrence_id as event_id, C.procedure_date as start_date, (C.procedure_date + 1*INTERVAL'1 day') as END_DATE, C.procedure_concept_id as TARGET_CONCEPT_ID
from 
(
  select po.*, ROW_NUMBER() over (PARTITION BY po.person_id ORDER BY po.procedure_date, po.procedure_occurrence_id) as ordinal
  FROM @cdm_database_schema.PROCEDURE_OCCURRENCE po
where po.procedure_concept_id in (SELECT concept_id from  Codesets where codeset_id = 1)
) C

-- End Procedure Occurrence Criteria

) A on A.person_id = P.person_id and A.START_DATE >= P.OP_START_DATE AND A.START_DATE <= P.OP_END_DATE AND A.START_DATE >= P.OP_START_DATE and A.START_DATE <= P.OP_END_DATE
GROUP BY p.person_id, p.event_id
HAVING COUNT(A.TARGET_CONCEPT_ID) >= 1
-- End Correlated Criteria

  ) CQ on E.person_id = CQ.person_id and E.event_id = CQ.event_id
  GROUP BY E.person_id, E.event_id
  HAVING COUNT(index_id) = 1
) G
-- End Criteria Group
) AC on AC.person_id = pe.person_id AND AC.event_id = pe.event_id
) Results
;

Other considerations/thoughts for inclusionRuleCohorts:

3. included_events


The output is event_id, person_id, start_date, end_date, op_start_date, op_end_date

(original SQL)

CREATE TEMP TABLE included_events

AS
WITH cteIncludedEvents(event_id, person_id, start_date, end_date, op_start_date, op_end_date, ordinal)  AS (
  SELECT event_id, person_id, start_date, end_date, op_start_date, op_end_date, row_number() over (partition by person_id order by start_date ASC) as ordinal
  from
  (
    select Q.event_id, Q.person_id, Q.start_date, Q.end_date, Q.op_start_date, Q.op_end_date, SUM(coalesce(POWER(cast(2 as bigint), I.inclusion_rule_id), 0)) as inclusion_rule_mask
    from qualified_events Q
    LEFT JOIN inclusionRuleCohorts I on I.person_id = Q.person_id and I.event_id = Q.event_id
    GROUP BY Q.event_id, Q.person_id, Q.start_date, Q.end_date, Q.op_start_date, Q.op_end_date
  ) MG -- matching groups

  -- the matching group with all bits set ( POWER(2,# of inclusion rules) - 1 = inclusion_rule_mask
  WHERE (MG.inclusion_rule_mask = POWER(cast(2 as bigint),1)-1)

)
 SELECT
event_id, person_id, start_date, end_date, op_start_date, op_end_date

FROM
cteIncludedEvents Results
WHERE Results.ordinal = 1
;

So the changes are to capture the visit_occurrence_id corresponding to qualified_events and inclusionRuleCohorts in temporary tables and use them in the joins? Thoughts?

gowthamrao commented 7 years ago

Option 2: Using Nested criteria within primary event.

The second approach maybe using NESTED queries within primary events. http://www.ohdsi.org/web/atlas/#/cohortdefinition/922830 image

The generated SQL is

CREATE TEMP TABLE qualified_events

AS
WITH primary_events (event_id, person_id, start_date, end_date, op_start_date, op_end_date)  AS (
-- Begin Primary Events
select row_number() over (PARTITION BY P.person_id order by P.start_date) as event_id, P.person_id, P.start_date, P.end_date, OP.observation_period_start_date as op_start_date, OP.observation_period_end_date as op_end_date
FROM
(
  select P.person_id, P.start_date, P.end_date, ROW_NUMBER() OVER (PARTITION BY person_id ORDER BY start_date ASC) ordinal
  FROM 
  (
  select PE.person_id, PE.event_id, PE.start_date, PE.end_date, PE.target_concept_id FROM (
-- Begin Condition Occurrence Criteria
SELECT C.person_id, C.condition_occurrence_id as event_id, C.condition_start_date as start_date, COALESCE(C.condition_end_date, (C.condition_start_date + 1*INTERVAL'1 day')) as end_date, C.CONDITION_CONCEPT_ID as TARGET_CONCEPT_ID
FROM 
(
  SELECT co.*, ROW_NUMBER() over (PARTITION BY co.person_id ORDER BY co.condition_start_date, co.condition_occurrence_id) as ordinal
  FROM @cdm_database_schema.CONDITION_OCCURRENCE co
  where co.condition_concept_id in (SELECT concept_id from  Codesets where codeset_id = 0)
) C

-- End Condition Occurrence Criteria

) PE
JOIN (
-- Begin Criteria Group
select 0 as index_id, person_id, event_id
FROM
(
  select E.person_id, E.event_id 
  FROM (SELECT Q.person_id, Q.event_id, Q.start_date, Q.end_date, OP.observation_period_start_date as op_start_date, OP.observation_period_end_date as op_end_date
FROM (-- Begin Condition Occurrence Criteria
SELECT C.person_id, C.condition_occurrence_id as event_id, C.condition_start_date as start_date, COALESCE(C.condition_end_date, (C.condition_start_date + 1*INTERVAL'1 day')) as end_date, C.CONDITION_CONCEPT_ID as TARGET_CONCEPT_ID
FROM 
(
  SELECT co.*, ROW_NUMBER() over (PARTITION BY co.person_id ORDER BY co.condition_start_date, co.condition_occurrence_id) as ordinal
  FROM @cdm_database_schema.CONDITION_OCCURRENCE co
  where co.condition_concept_id in (SELECT concept_id from  Codesets where codeset_id = 0)
) C

-- End Condition Occurrence Criteria
) Q
JOIN @cdm_database_schema.OBSERVATION_PERIOD OP on Q.person_id = OP.person_id 
  and OP.observation_period_start_date <= Q.start_date and OP.observation_period_end_date >= Q.start_date
) E
  LEFT JOIN
  (
    -- Begin Correlated Criteria
SELECT 0 as index_id, p.person_id, p.event_id
FROM (SELECT Q.person_id, Q.event_id, Q.start_date, Q.end_date, OP.observation_period_start_date as op_start_date, OP.observation_period_end_date as op_end_date
FROM (-- Begin Condition Occurrence Criteria
SELECT C.person_id, C.condition_occurrence_id as event_id, C.condition_start_date as start_date, COALESCE(C.condition_end_date, (C.condition_start_date + 1*INTERVAL'1 day')) as end_date, C.CONDITION_CONCEPT_ID as TARGET_CONCEPT_ID
FROM 
(
  SELECT co.*, ROW_NUMBER() over (PARTITION BY co.person_id ORDER BY co.condition_start_date, co.condition_occurrence_id) as ordinal
  FROM @cdm_database_schema.CONDITION_OCCURRENCE co
  where co.condition_concept_id in (SELECT concept_id from  Codesets where codeset_id = 0)
) C

-- End Condition Occurrence Criteria
) Q
JOIN @cdm_database_schema.OBSERVATION_PERIOD OP on Q.person_id = OP.person_id 
  and OP.observation_period_start_date <= Q.start_date and OP.observation_period_end_date >= Q.start_date
) P
LEFT JOIN
(
  -- Begin Procedure Occurrence Criteria
select C.person_id, C.procedure_occurrence_id as event_id, C.procedure_date as start_date, (C.procedure_date + 1*INTERVAL'1 day') as END_DATE, C.procedure_concept_id as TARGET_CONCEPT_ID
from 
(
  select po.*, ROW_NUMBER() over (PARTITION BY po.person_id ORDER BY po.procedure_date, po.procedure_occurrence_id) as ordinal
  FROM @cdm_database_schema.PROCEDURE_OCCURRENCE po
where po.procedure_concept_id in (SELECT concept_id from  Codesets where codeset_id = 1)
) C

-- End Procedure Occurrence Criteria

) A on A.person_id = P.person_id and A.START_DATE >= P.OP_START_DATE AND A.START_DATE <= P.OP_END_DATE AND A.START_DATE >= P.OP_START_DATE and A.START_DATE <= P.OP_END_DATE
GROUP BY p.person_id, p.event_id
HAVING COUNT(A.TARGET_CONCEPT_ID) >= 1
-- End Correlated Criteria

  ) CQ on E.person_id = CQ.person_id and E.event_id = CQ.event_id
  GROUP BY E.person_id, E.event_id
  HAVING COUNT(index_id) = 1
) G
-- End Criteria Group
) AC on AC.person_id = pe.person_id and AC.event_id = pe.event_id

  ) P
) P
JOIN @cdm_database_schema.observation_period OP on P.person_id = OP.person_id and P.start_date >=  OP.observation_period_start_date and P.start_date <= op.observation_period_end_date
WHERE (OP.OBSERVATION_PERIOD_START_DATE + 0*INTERVAL'1 day') <= P.START_DATE AND (P.START_DATE + 0*INTERVAL'1 day') <= OP.OBSERVATION_PERIOD_END_DATE AND P.ordinal = 1
-- End Primary Events

)
 SELECT
event_id, person_id, start_date, end_date, op_start_date, op_end_date

FROM
(
  select pe.event_id, pe.person_id, pe.start_date, pe.end_date, pe.op_start_date, pe.op_end_date, row_number() over (partition by pe.person_id order by pe.start_date ASC) as ordinal
  FROM primary_events pe

) QE

;

User selects an option that would be something like this image

Selection would modify the generated SQL only in the correlated-criteria. This is the portion of the original SQL that would be modified


  (
    -- Begin Correlated Criteria
SELECT 0 as index_id, p.person_id, p.event_id
FROM (SELECT Q.person_id, Q.event_id, Q.start_date, Q.end_date, OP.observation_period_start_date as op_start_date, OP.observation_period_end_date as op_end_date
FROM (-- Begin Condition Occurrence Criteria
SELECT C.person_id, C.condition_occurrence_id as event_id, C.condition_start_date as start_date, COALESCE(C.condition_end_date, (C.condition_start_date + 1*INTERVAL'1 day')) as end_date, C.CONDITION_CONCEPT_ID as TARGET_CONCEPT_ID
FROM 
(
  SELECT co.*, ROW_NUMBER() over (PARTITION BY co.person_id ORDER BY co.condition_start_date, co.condition_occurrence_id) as ordinal
  FROM @cdm_database_schema.CONDITION_OCCURRENCE co
  where co.condition_concept_id in (SELECT concept_id from  Codesets where codeset_id = 0)
) C

-- End Condition Occurrence Criteria
) Q
JOIN @cdm_database_schema.OBSERVATION_PERIOD OP on Q.person_id = OP.person_id 
  and OP.observation_period_start_date <= Q.start_date and OP.observation_period_end_date >= Q.start_date
) P
LEFT JOIN
(
  -- Begin Procedure Occurrence Criteria
select C.person_id, C.procedure_occurrence_id as event_id, C.procedure_date as start_date, (C.procedure_date + 1*INTERVAL'1 day') as END_DATE, C.procedure_concept_id as TARGET_CONCEPT_ID
from 
(
  select po.*, ROW_NUMBER() over (PARTITION BY po.person_id ORDER BY po.procedure_date, po.procedure_occurrence_id) as ordinal
  FROM @cdm_database_schema.PROCEDURE_OCCURRENCE po
where po.procedure_concept_id in (SELECT concept_id from  Codesets where codeset_id = 1)
) C

-- End Procedure Occurrence Criteria

) A on A.person_id = P.person_id and A.START_DATE >= P.OP_START_DATE AND A.START_DATE <= P.OP_END_DATE AND A.START_DATE >= P.OP_START_DATE and A.START_DATE <= P.OP_END_DATE
GROUP BY p.person_id, p.event_id
HAVING COUNT(A.TARGET_CONCEPT_ID) >= 1
-- End Correlated Criteria

  ) 

user selection will modify the correlated-queries portion of the SQL:

  1. Modify all @criteriaQueries to use visit_occurrence_id. , C.visit_occurrence_id as visit_id
  2. Join by visit_id A.visit_id = P.visit_id

Final SQL

  (
    -- Begin Correlated Criteria
SELECT 0 as index_id, p.person_id, p.event_id
FROM (SELECT Q.person_id, Q.event_id, Q.visit_id, Q.start_date, Q.end_date, OP.observation_period_start_date as op_start_date, OP.observation_period_end_date as op_end_date
FROM (-- Begin Condition Occurrence Criteria
SELECT C.person_id, C.condition_occurrence_id as event_id, C.condition_start_date as start_date, COALESCE(C.condition_end_date, (C.condition_start_date + 1*INTERVAL'1 day')) as end_date, C.CONDITION_CONCEPT_ID as TARGET_CONCEPT_ID, C.visit_occurrence_id as visit_id
FROM 
(
  SELECT co.*, ROW_NUMBER() over (PARTITION BY co.person_id ORDER BY co.condition_start_date, co.condition_occurrence_id) as ordinal
  FROM @cdm_database_schema.CONDITION_OCCURRENCE co
  where co.condition_concept_id in (SELECT concept_id from  Codesets where codeset_id = 0)
) C

-- End Condition Occurrence Criteria
) Q
JOIN @cdm_database_schema.OBSERVATION_PERIOD OP on Q.person_id = OP.person_id 
  and OP.observation_period_start_date <= Q.start_date and OP.observation_period_end_date >= Q.start_date
) P
LEFT JOIN
(
  -- Begin Procedure Occurrence Criteria
select C.person_id, C.procedure_occurrence_id as event_id, C.procedure_date as start_date, (C.procedure_date + 1*INTERVAL'1 day') as END_DATE, C.procedure_concept_id as TARGET_CONCEPT_ID, C.visit_occurrence_id as visit_id
from 
(
  select po.*, ROW_NUMBER() over (PARTITION BY po.person_id ORDER BY po.procedure_date, po.procedure_occurrence_id) as ordinal
  FROM @cdm_database_schema.PROCEDURE_OCCURRENCE po
where po.procedure_concept_id in (SELECT concept_id from  Codesets where codeset_id = 1)
) C

-- End Procedure Occurrence Criteria

) A on A.person_id = P.person_id and A.START_DATE >= P.OP_START_DATE AND A.START_DATE <= P.OP_END_DATE AND A.START_DATE >= P.OP_START_DATE and A.START_DATE <= P.OP_END_DATE and A.visit_id = P.visit_id
GROUP BY p.person_id, p.event_id
HAVING COUNT(A.TARGET_CONCEPT_ID) >= 1
-- End Correlated Criteria

  ) 

What do you think?

gowthamrao commented 7 years ago

Option 2 is easier, but Option 1 is better because it give us so much more information as part of cohort-generation-report. But it will require a new temp table that will contain all visit_occurrence_id that corresponds to qualified_events. This table will have to include all the visit's for the person that may contribute to the person_id, start_date of the qualified_event.

create  #qualified_visit_occurrence
as
(
    select distinct VO.visit_occurrence_id
    from
        @cdm_database_schema.visit_occurrence_id VO
    JOIN qualified_events as QE on VO.person_id = QE.person_id and VO.visit_start_date = QE.start_date
    JOIN (
      -- Begin Condition Occurrence Criteria
    SELECT C.person_id, C.condition_occurrence_id as event_id, C.condition_start_date as start_date, COALESCE(C.condition_end_date, (C.condition_start_date + 1*INTERVAL'1 day')) as end_date, C.CONDITION_CONCEPT_ID as TARGET_CONCEPT_ID, C.visit_occurrence_id
    FROM 
    (
      SELECT co.*, ROW_NUMBER() over (PARTITION BY co.person_id ORDER BY co.condition_start_date, co.condition_occurrence_id) as ordinal
      FROM @cdm_database_schema.CONDITION_OCCURRENCE co
      where co.condition_concept_id in (SELECT concept_id from  Codesets where codeset_id = 0)
    ) C

    -- End Condition Occurrence Criteria

    ---- UNION OTHER @criteriaQueries of qualified_events with visit_occurrence_id
      ) P
     on VO.visit_occurrence_id = P.visit_occurrence_id
) P  
chrisknoll commented 7 years ago

Option 2 solves both your needs. But let's just first focus on the visit association functionality:

You're thought process is correct, that it involves enhancing the SQL that goes into nested criteria. Broadly speaking, nested criteria is just a container for a corelated query, which you see begin here:

    -- Begin Correlated Criteria
SELECT 0 as index_id, p.person_id, p.event_id
FROM (SELECT Q.person_id, Q.event_id, Q.visit_id, Q.start_date, Q.end_date, OP.observation_period_start_date as op_start_date, OP.observation_period_end_date as op_end_date
...
...
) A on A.person_id = P.person_id and A.START_DATE >= P.OP_START_DATE AND A.START_DATE <= P.OP_END_DATE AND A.START_DATE >= P.OP_START_DATE and A.START_DATE <= P.OP_END_DATE and A.visit_id = P.visit_id
GROUP BY p.person_id, p.event_id
HAVING COUNT(A.TARGET_CONCEPT_ID) >= 1
-- End Correlated Criteria

So the parts that need to change to make all this work is:

So you've shown all those things in your SQL updates, but the task is to find it in the code.

The individual domain criteria queries are found here: https://github.com/OHDSI/circe-be/tree/master/src/main/resources/resources/cohortdefinition/sql

Look for the names of the SQL that corresponds to the domain. If there's a domain (like observation period) that doesn't have a visit association, just return NULL as visit_id in the query. All these domain criteria queries must have the same column list in the select.

Next, we want to join the A subquery with P's visit_id which you described like so:

 A on A.person_id = P.person_id and A.START_DATE >= P.OP_START_DATE AND A.START_DATE <= P.OP_END_DATE AND A.START_DATE >= P.OP_START_DATE and A.START_DATE <= P.OP_END_DATE and A.visit_id = P.visit_id

You can find that sql fragment here: https://github.com/OHDSI/circe-be/blob/master/src/main/resources/resources/cohortdefinition/sql/additionalCriteria.sql

FYI: It's called 'additionalCriteria' because that's what the original idea called it, and it's why we use A in the correlated query's alias (A for Additional Criteria) and the P represents the Primary Events)

In the sql template you see this:

) A on A.person_id = P.person_id and @windowCriteria

The @windowCriteria value is built based on the before-after values specified in the cohort expression. We could add logic to the @windowCriteria replacement to optionally also limit to the same visit (ie: add AND A.visit_id = P.visit_id if restrictVisit == true)

Where is restrictVisit stored? In the cohort expression model, of course! The object that contains this info is found here: https://github.com/OHDSI/circe-be/blob/master/src/main/java/org/ohdsi/circe/cohortdefinition/CorelatedCriteria.java

And looks like this:

public class CorelatedCriteria {
  @JsonProperty("Criteria")
  public Criteria criteria;  

  @JsonProperty("StartWindow")
  public Window startWindow;  

  @JsonProperty("EndWindow")
  public Window endWindow;  

  @JsonProperty("Occurrence")
  public Occurrence occurrence;  
}

Quick explanation of these properties::

What is needed in this object is a new property (boolean property, default to false) which stores the choice to limit the correlated event to the same visit. This can be called 'restrictVisit' for example, and being false by default means by default, we don't want to restrict the visit.

I think that's all the pieces you need to implement 'same visit restrictions'. Let me know if you have questions.

chrisknoll commented 7 years ago

Re: Option 1: summary statistics about visits:

Once we have visit_id added to the #qualified_events temp table, we can then add some additional statistic generation logic to the query. You won't see this if you view the generated SQL in ATLAS: atlas just shows the cohort definition sql, and nothing about the statistic generation. You can see the statistics that are built here:

https://github.com/OHDSI/circe-be/blob/master/src/main/resources/resources/cohortdefinition/sql/generateCohort.sql#L84

When you added the collapsing logic, you probably saw this, but haven't looked at it deeply. This part of the code creates some temp tables to produce statistics, but also persists the generated statistics to a permanent table (in the current state, it writes data into @results_database_schema.cohort_inclusion_stats, for example). After the changes are made to add the visit_id to the qualified_events temp table, you can then add some logic to this 'generation stats' part of the code to write any sort of visit statistics that you are interested in.

-Chris

gowthamrao commented 7 years ago

Yes - agree with you on most - except

1. qualified_events and visit_occurrence_id

  • Add the visit_id into the primary events/#qualified_events select list (and qualified events temp table)
  • In the join for correlated query, join on the visit from the primary event with the visit from the correlated criteria.

visit_id added to the #qualified_events temp table

The problem, I see, in adding visit_occurrence_id into qualified_events is the current use of row_number() function , row_number() over (partition by pe.person_id order by pe.start_date ASC) as ordinal to pick the earliest event, all events or latest event. The problem is because, a person may have many visits in visit_occurrence table with the same start_date that meets the qualified_event criteria. i.e. there may be many visit_occurrence_id for the same person_id + start_date + end_date.

Potential solution is to replace row_number() with dense_rank() , dense_rank() over (partition by pe.person_id order by pe.start_date ASC) as ordinal . This will treat all combination's of person_id + start_date the same and assign the same ordinal value. Selecting ordinal = 1 will retain all visit_occurrence records (unlike row_number) even when they have the same person_id, start_date combination.

The problem with this solution is that qualified_events may now have duplication of records for the same person_id, start_date, end_date combinations - but with different visit_occurrence_id. Would this duplication of person_id, start_date and end_date cause problems downstream to other sections that depend on qualified_events temp table (including statistics).

2. will this approach solve the visit_detail_id

visit_detail_id was added as part of OMOP v5.3. What if we want to restrict to the same visit_detail_id? Should we support both visit_occurrence_id and visit_detail_id as part of the qualified_event and clinical_event tables? Or atleast put this as a roadmap?

chrisknoll commented 7 years ago

It's true that the row_number() is used to uniquely identify the UNION'd events from the primary events...this is because using the record's id (condition_occurrence_id vs. drug_exposure_id) may collide, so when the union is performed, we need to tag each row uniquely. We also use this information to select the first event per person, it's true, but not required. If you want to consider all events occurring on the same day 'first' then you'd have to make a qualifying/nested criteria saying that there are 0 other events of the same type prior.

But I think we're off track of what we're trying to solve here. qualified events always had the potential of having different events happen on the same day. adding visit occurrence linking between events won't change that. You might consider that the way row_number() may lead to the same person_startdate pair, but that's something to be solved with how you logically express the cohort expression, not something fundamental that needs to be changed in the queries. The bottom line with primary events is that each event needs to have a unique identifier associated with the record. using row_number() == 1 was just a short cut to find the first row for a person. We can discuss making that more robust in a separate issue.

But on the topic of linking primary events to the correlated events via the visit_id: adding this column and optionally adding it to the join in the correlated sub query is not introducing any new information that breaks the way we currently row_number() today. So I'd like to see that we add the capablity of saying 'find another event within X days that happened within the same visit' and then deal with any potential problems that we currently have with primary_events having duplicate person_startdate records in it.

gowthamrao commented 7 years ago

will make it happen! :thumbsup:

gowthamrao commented 7 years ago

There are atleast two row_number() is QE, only one of them is used for event_id

Used for event_id

It's true that the row_number() is used to uniquely identify the UNION'd events from the primary events...this is because using the record's id (condition_occurrence_id vs. drug_exposure_id) may collide

Yes, your argument above for event_id using row_number() makes sense. The code to create event_id is

-- Begin Primary Events
select row_number() over (PARTITION BY P.person_id order by P.start_date) as event_id, P.person_id, P.start_date, P.end_date, OP.observation_period_start_date as op_start_date, OP.observation_period_end_date as op_end_date
FROM
(

Not used for event_id

 SELECT
event_id, person_id, start_date, end_date, op_start_date, op_end_date

FROM
(
  select pe.event_id, pe.person_id, pe.start_date, pe.end_date, pe.op_start_date, pe.op_end_date, row_number() over (partition by pe.person_id order by pe.start_date ASC) as ordinal
  FROM primary_events pe

) QE

The above code is a different row_number() and is used for earliest-event, latest-event or all-events. What are your thoughts on converting that row_number() into dense_rank(). This will give us all the visit_occurrence_id that have the same person_id + start_date by assigning the same ordinal for all unique combinations of person_id + start_date, instead of assigning different ordinal for every occurrence of person_id + start_date in row_number().

 SELECT
event_id, person_id, start_date, end_date, op_start_date, op_end_date, visit_occurrence_id

FROM
(
  select pe.event_id, pe.person_id, pe.start_date, pe.end_date, pe.op_start_date, pe.op_end_date, pe.visit_occurrence_id, dense_rank() over (partition by pe.person_id order by pe.start_date ASC) as ordinal
  FROM primary_events pe

) QE

that's something to be solved with how you logically express the cohort expression, not something fundamental that needs to be changed in the queries. Agree that by selecting all-events we can solve this logically, but by using dense_rank we can avoid many human errors.

chrisknoll commented 7 years ago

I see, yes, if we keep the row_number() on the primary event UNIONS to uniquely identify them (to resolve potential domain-level id collisions) but in the case we're looking for the first occurrence we use dense_rank(), that should be fine. Now that we have cohort collapsing, we won't run into the issue where you have multiple records per person in the same time window. That's the other reason why we originally went with row_number()...we were seeing the same person show up multiple times and we needed a way to eliminate that.

So what you describe is worth a shot, just take care that where we use event_id should be unique per-record, and ordinal we can dense_rank it so that we could get multiple records in the event of a tie.

-Chris

gowthamrao commented 7 years ago

great - created PR https://github.com/OHDSI/circe-be/pull/7

chrisknoll commented 7 years ago

Nice, that's a very well written PR. Let me pull a local branch, run some tests, and if all goes well we can merge this in.

-Chris

chrisknoll commented 7 years ago

The back-end part of this is done, but the front end still needs to be changed to support the new 'restrictVisits' field.

anthonysena commented 6 years ago

Closed via OHDSI/circe-be#7 and #462.