Negation of cohort subset operator must join on subjetct_id AND start_date

When running a negation cohort subset operator, you may miss a person from the result because they have multiple events in their cohort that passes on one date and fails on another.

This is the cohort subset query:

SELECT B.subject_id,
       B.cohort_start_date,
       B.cohort_end_date --INTO #S_1
FROM (SELECT T.subject_id,
             T.cohort_start_date,
             T.cohort_end_date
      FROM #cohort_sub_base T
        JOIN @cohortTable S ON T.subject_id = S.subject_id
      WHERE S.cohort_definition_id IN (12467)
      AND   (S.cohort_start_date >= DATEADD(d,-365,T.cohort_start_date) AND S.cohort_start_date <= DATEADD(d,30,T.cohort_start_date))
      AND   (S.cohort_end_date >= DATEADD(d,-9999,T.cohort_start_date) AND S.cohort_end_date <= DATEADD(d,9999,T.cohort_start_date))
      GROUP BY T.subject_id,
               T.cohort_start_date,
               T.cohort_end_date
      HAVING COUNT(DISTINCT S.COHORT_DEFINITION_ID) >= 1) A
  ** RIGHT JOIN #cohort_sub_base B ON B.subject_id = A.subject_id ** 
WHERE A.subject_id IS NULL
AND B.SUBJECT_ID = 1254018103

Note the code in the RIGHT JOIN. The right join is what is going to identify those people who were in the base cohort but not in the resulting subset to NEGATE and return those people who didn't match on the cohort to subset with.

However, in a case where you have multiple events per person, you can have a case where a person matches on one date but not another, but the RIGHT JOIN is only joining on subject_id. The correct code should match on subject_id and cohort_start_date any time we are trying to identify the matching records:

SELECT B.subject_id,
       B.cohort_start_date,
       B.cohort_end_date --INTO #S_1
FROM (SELECT T.subject_id,
             T.cohort_start_date,
             T.cohort_end_date
      FROM #cohort_sub_base T
        JOIN @cohortTable S ON T.subject_id = S.subject_id
        AND s.SUBJECT_ID = 1254018103
      WHERE S.cohort_definition_id IN (12467)
      AND   (S.cohort_start_date >= DATEADD(d,-365,T.cohort_start_date) AND S.cohort_start_date <= DATEADD(d,30,T.cohort_start_date))
      AND   (S.cohort_end_date >= DATEADD(d,-9999,T.cohort_start_date) AND S.cohort_end_date <= DATEADD(d,9999,T.cohort_start_date))
      GROUP BY T.subject_id,
               T.cohort_start_date,
               T.cohort_end_date
      HAVING COUNT(DISTINCT S.COHORT_DEFINITION_ID) >= 1) A
  RIGHT JOIN #cohort_sub_base B ON B.subject_id = A.subject_id
   *** AND b.COHORT_START_DATE = a.COHORT_START_DATE ***
WHERE A.subject_id IS NULL
AND B.SUBJECT_ID = 1254018103

OHDSI / CohortGenerator

Negation of cohort subset operator must join on subjetct_id AND start_date #167