Closed callahantiff closed 5 years ago
dx_case_inclusion_criteria_1: Had to look up LAG(). How did you ever find that one. Must have been lots of Googling. Based on the examples in GBQ documentation, don't you need to PARTITION BY PATIENT_ID rather than just ORDER BY PATIENT_ID, VISIT_START_DATETIME. Else, won't the last row of the previous patient and the first row of the next patient be compared in your DATETIME_DIFF?
dx_case_inclusion_criteria_1: What if visit1-visit2 < 30 days and visit2-visit3<30 days but visit1-visit3 > 30 days. Won't your DATETIME_DIFF using LAG() miss that the patient meets criteria because of visit1-visit3? Seems you need all combinations of visit_start_dt so you can find any combo that is >30 days, not just adjacent ones. (I think your LAG() logic only look at adjacent days but I've never used it)
dx_case_inclusion_criteria_1: "History of" conditions are in observation per OHDSI conventions. Not sure if PEDSnet is following that convention. Your query doesn't include any relevant conditions found in the observation table. Since these events will not have a relevant time stamp, not sure how you are supposed to use them in the calculation of >30 days part anyway. So I don't think this is a big loss. But just wanted to note that you are not grabbing history of conditions.
dx_case_inclusion_criteria_1: Where is the "or one hospitalization" option? I see it in dx_case_inclusion_criteria_2 but not in criteria_1.
dx_case_inclusion_criteria_1 and 2: Why not use visit__concept_id = 8061688 for outpatient visits rather than using admitting_source_concept_id in criteria_1 and care site.place_of_service in criteria_2? This is a standard code.
dx_case_exclusion_criteria_1: I am assuming: cohort.phenotype_definition_number = 2 are SS trait diagnoses cohort.phenotype_definition_number = 1 are SS disease diagnoses
Your final logic statement: Will the EXCEPT DISTINCT work as you want because of the code_set variable? Seems if somebody is included based on one code_set but excluded because of some other code_set, they won't be excluded because of the EXCEPT DISTINCT. Is this right? Should they get excluded no matter what code_set they are in? If so, you may need to rewrite to exclude based on a NOT IN clause using only PERSON_IDs in the dx_case_exclusion_criteria_1 CTE.
No more comments on SS.
dx_case_inclusion_criteria_1: Had to look up LAG(). How did you ever find that one. Must have been lots of Googling. Based on the examples in GBQ documentation, don't you need to PARTITION BY PATIENT_ID rather than just ORDER BY PATIENT_ID, VISIT_START_DATETIME. Else, won't the last row of the previous patient and the first row of the next patient be compared in your DATETIME_DIFF?
Definitely found it from an epic Googling session! š Good catch, thank you for pointing that out! I will make the change.
dx_case_inclusion_criteria_1 and 2: Why not use visit__concept_id = 8061688 for outpatient visits rather than using admitting_source_concept_id in criteria_1 and care site.place_of_service in criteria_2? This is a standard code.
Great suggestion, that's much simpler!
dx_case_inclusion_criteria_1: Where is the "or one hospitalization" option? I see it in dx_case_inclusion_criteria_2 but not in criteria_1.
Fixed!
dx_case_inclusion_criteria_1: "History of" conditions are in observation per OHDSI conventions. Not sure if PEDSnet is following that convention. Your query doesn't include any relevant conditions found in the observation table. Since these events will not have a relevant time stamp, not sure how you are supposed to use them in the calculation of >30 days part anyway. So I don't think this is a big loss. But just wanted to note that you are not grabbing history of conditions.
Thank you for pointing this out. I'm really glad we can have a record of this!
Your final logic statement: Will the EXCEPT DISTINCT work as you want because of the code_set variable? Seems if somebody is included based on one code_set but excluded because of some other code_set, they won't be excluded because of the EXCEPT DISTINCT. Is this right? Should they get excluded no matter what code_set they are in? If so, you may need to rewrite to exclude based on a NOT IN clause using only PERSON_IDs in the dx_case_exclusion_criteria_1 CTE.
I think it's ok as written. We want to allow for different patients to be included and excluded depending on the code set.
dx_case_exclusion_criteria_1: I am assuming: cohort.phenotype_definition_number = 2 are SS trait diagnoses cohort.phenotype_definition_number = 1 are SS disease diagnoses
Yep, that's right!
dx_case_inclusion_criteria_1: What if visit1-visit2 < 30 days and visit2-visit3<30 days but visit1-visit3 > 30 days. Won't your DATETIME_DIFF using LAG() miss that the patient meets criteria because of visit1-visit3? Seems you need all combinations of visit_start_dt so you can find any combo that is >30 days, not just adjacent ones. (I think your LAG() logic only look at adjacent days but I've never used it)
Hm...That's an interesting point. How do we write a query which does that? I got pretty stuck on this part of the query originally, which is why I landed on LAG()
. Are you suggesting messing with the ORDER BY
argument and/or also including the LEAD()
function?
In abstract you have two visit-occurrences, v1 and V2, joined by patid and in your where clause v1.starttime < v2 start time and datediff(v1,v2,day)>30 The JOIN and inequality gives you all combinations of visits where v1 occurs before v2
Michael G. Kahn MD, PhD Professor with Tenure, Section of Clinical Informatics, Department of Pediatrics Co-Director & Translational Informatics Core Director, Colorado Clinical and Translational Sciences Institute Director, Health Data Compass Associate Director, Colorado Center for Personalized Medicine University of Colorado Denver Anschutz Medical Campus Medical Director, Research Informatics Children's Hospital Colorado Research Institute Aurora, Colorado E: Michael.Kahn@ucdenver.edu P: 303-724-8334
From: Tiffany J. Callahan notifications@github.com Sent: Sunday, August 4, 2019 9:42:57 PM To: callahantiff/PheKnowVec PheKnowVec@noreply.github.com Cc: Kahn, Michael Michael.Kahn@ucdenver.edu; Mention mention@noreply.github.com Subject: Re: [callahantiff/PheKnowVec] Query Verification: Sickle Cell Disease Cohort (#105)
dx_case_inclusion_criteria_1: What if visit1-visit2 < 30 days and visit2-visit3<30 days but visit1-visit3 > 30 days. Won't your DATETIME_DIFF using LAG() miss that the patient meets criteria because of visit1-visit3? Seems you need all combinations of visit_start_dt so you can find any combo that is >30 days, not just adjacent ones. (I think your LAG() logic only look at adjacent days but I've never used it)
Hm...That's an interesting point. How do we write a query which does that? I got pretty stuck on this part of the query originally, which is why I landed on LAG(). Are you suggesting messing with the ORDER BY argument and/or also including the LEAD() function?
ā You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/callahantiff/PheKnowVec/issues/105?email_source=notifications&email_token=AA557TXAN27VZ5E46GCNPDLQC6OUDA5CNFSM4IF3OQ3KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD3QTDPI#issuecomment-518074813, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AA557TQY5FTV3MD56W7T7VLQC6OUDANCNFSM4IF3OQ3A.
In abstract you have two visit-occurrences, v1 and V2, joined by patid and in your where clause v1.starttime < v2 start time and datediff(v1,v2,day)>30 The JOIN and inequality gives you all combinations of visits where v1 occurs before v2 Michael G. Kahn MD, PhD Professor with Tenure, Section of Clinical Informatics, Department of Pediatrics Co-Director & Translational Informatics Core Director, Colorado Clinical and Translational Sciences Institute Director, Health Data Compass Associate Director, Colorado Center for Personalized Medicine University of Colorado Denver Anschutz Medical Campus Medical Director, Research Informatics Children's Hospital Colorado Research Institute Aurora, Colorado E: Michael.Kahn@ucdenver.edu P: 303-724-8334 ā¦ ____ From: Tiffany J. Callahan notifications@github.com Sent: Sunday, August 4, 2019 9:42:57 PM To: callahantiff/PheKnowVec PheKnowVec@noreply.github.com Cc: Kahn, Michael Michael.Kahn@ucdenver.edu; Mention mention@noreply.github.com Subject: Re: [callahantiff/PheKnowVec] Query Verification: Sickle Cell Disease Cohort (#105) dx_case_inclusion_criteria_1: What if visit1-visit2 < 30 days and visit2-visit3<30 days but visit1-visit3 > 30 days. Won't your DATETIME_DIFF using LAG() miss that the patient meets criteria because of visit1-visit3? Seems you need all combinations of visit_start_dt so you can find any combo that is >30 days, not just adjacent ones. (I think your LAG() logic only look at adjacent days but I've never used it) Hm...That's an interesting point. How do we write a query which does that? I got pretty stuck on this part of the query originally, which is why I landed on LAG(). Are you suggesting messing with the ORDER BY argument and/or also including the LEAD() function? ā You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub<#105?email_source=notifications&email_token=AA557TXAN27VZ5E46GCNPDLQC6OUDA5CNFSM4IF3OQ3KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD3QTDPI#issuecomment-518074813>, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AA557TQY5FTV3MD56W7T7VLQC6OUDANCNFSM4IF3OQ3A.
OK, I think I follow (and I feel dumb for not doing this in the first place). So, something like this? Note that this is not tailored to the solution yet, but a more general query as I just want to make sure I understand.
SELECT v1.person_id, v1.visit_start_datetime AS v1_date, v2.visit_start_datetime AS v2_date, DATETIME_DIFF(DATETIME(v2.visit_start_datetime), DATETIME(v1.visit_start_datetime), DAY) > 30
FROM CHCO_DeID_Oct2018.visit_occurrence v1
JOIN (SELECT person_id, visit_start_datetime
FROM CHCO_DeID_Oct2018.visit_occurrence
WHERE visit_concept_id IN (9202)) v2
ON v1.person_id = v2.person_id
WHERE
v1.visit_concept_id IN (9202)
AND v1.visit_start_datetime < v2.visit_start_datetime
AND DATETIME_DIFF(DATETIME(v2.visit_start_datetime), DATETIME(v1.visit_start_datetime), DAY) > 30
GROUP BY v1.person_id, v1.visit_start_datetime, v2.visit_start_datetime
ORDER BY v1.visit_start_datetime, v2.visit_start_datetime;
Much easier than what you wrote:
SELECT v1.person_id, v1.visit_start_datetime AS First_Visit, v2.visit_start_datetime AS Second_Visit, DATETIME_DIFF(DATETIME(v2.visit_start_datetime), DATETIME(v1.visit_start_datetime), DAY) AS Days_Between_Visits FROM CHCO_DeID_Oct2018.visit_occurrence v1 JOIN CHCO_DeID_Oct2018.visit_occurrence v2 ON (v1.person_id = v2.person_id) WHERE v1.visit_concept_id in (9202) AND v2.visit_concept_id in (9202) AND v1.visit_start_datetime < v2.visit_start_datetime / ensures v1 is always before v2 / AND DATETIME_DIFF(DATETIME(v2.visit_start_datetime), DATETIME(v1.visit_start_datetime), DAY) > 30 ORDER BY v1.visit_start_datetime, v2.visit_start_datetime;
Much easier than what you wrote:
SELECT v1.person_id, v1.visit_start_datetime AS First_Visit, v2.visit_start_datetime AS Second_Visit, DATETIME_DIFF(DATETIME(v2.visit_start_datetime), DATETIME(v1.visit_start_datetime), DAY) AS Days_Between_Visits FROM CHCO_DeID_Oct2018.visit_occurrence v1 JOIN CHCO_DeID_Oct2018.visit_occurrence v2 ON (v1.person_id = v2.person_id) WHERE v1.visit_concept_id in (9202) AND v2.visit_concept_id in (9202) AND v1.visit_start_datetime < v2.visit_start_datetime / ensures v1 is always before v2 / AND DATETIME_DIFF(DATETIME(v2.visit_start_datetime), DATETIME(v1.visit_start_datetime), DAY) > 30 ORDER BY v1.visit_start_datetime, v2.visit_start_datetime;
Looks great, thanks!
OK, here is the updated query:
WITH dx_case_inclusion_criteria_1 AS (
SELECT co.person_id, cohort.standard_code_set AS code_set
FROM
CHCO_DeID_Oct2018.condition_occurrence co,
CHCO_DeID_Oct2018.SICKLECELLDISEASE_COHORT_VARS cohort
WHERE
co.condition_concept_id = CAST(cohort.standard_concept_id AS int64)
AND cohort.standard_code_set = {code_set_group}
GROUP BY co.person_id, cohort.standard_code_set
HAVING
COUNT(DISTINCT co.condition_concept_id) >= 1
),
visit_inclusion_criteria_1 AS (
SELECT person_id FROM
-- identify patients with at least 2 outpatient visits at least 30 days apart
(SELECT v1.person_id
FROM
CHCO_DeID_Oct2018.visit_occurrence v1,
CHCO_DeID_Oct2018.visit_occurrence v2
WHERE
v1.person_id = v2.person_id
AND v1.visit_concept_id in (9202)
AND v2.visit_concept_id in (9202)
AND v1.visit_start_datetime < v2.visit_start_datetime -- this line forces 2 distinct visits
AND DATETIME_DIFF(DATETIME(v2.visit_start_datetime), DATETIME(v1.visit_start_datetime), DAY) >= 30
GROUP BY v1.person_id, v1.visit_start_datetime, v2.visit_start_datetime
ORDER BY
v1.visit_start_datetime, v2.visit_start_datetime)
UNION ALL
-- identify patients with at least 1 hospitalization
(SELECT person_id FROM
CHCO_DeID_Oct2018.visit_occurrence
WHERE
visit_concept_id in (9201, 9203)
GROUP BY
person_id, visit_occurrence_id
HAVING
COUNT(DISTINCT visit_occurrence_id) >= 1)
),
dx_case_exclusion_criteria_1 AS (
SELECT v.person_id, cohort.standard_code_set AS code_set
FROM
CHCO_DeID_Oct2018.visit_occurrence v,
CHCO_DeID_Oct2018.care_site c,
CHCO_DeID_Oct2018.condition_occurrence co,
CHCO_DeID_Oct2018.SICKLECELLDISEASE_COHORT_VARS cohort
JOIN
(SELECT DISTINCT v.person_id, COUNT(DISTINCT co.condition_concept_id) AS trait_count, cohort.standard_code_set AS code_set
FROM
CHCO_DeID_Oct2018.visit_occurrence v,
CHCO_DeID_Oct2018.care_site c,
CHCO_DeID_Oct2018.condition_occurrence co,
CHCO_DeID_Oct2018.SICKLECELLDISEASE_COHORT_VARS cohort
WHERE co.visit_occurrence_id = v.visit_occurrence_id
AND co.condition_concept_id = CAST(cohort.standard_concept_id AS int64)
AND cohort.phenotype_definition_number = 2
AND cohort.standard_code_set = {code_set_group}
GROUP BY
v.person_id, code_set) b
ON v.person_id = b.person_id
WHERE co.visit_occurrence_id = v.visit_occurrence_id
AND co.condition_concept_id = CAST(cohort.standard_concept_id AS int64)
AND cohort.phenotype_definition_number = 1
GROUP BY
v.person_id, code_set, b.trait_count
HAVING
COUNT(DISTINCT co.condition_concept_id) > b.trait_count
)
SELECT person_id, code_set, 'SICKLECELLDISEASE_CASE_TYPE1' AS cohort_type FROM (
SELECT person_id, code_set FROM dx_case_inclusion_criteria_1
WHERE person_id IN (SELECT person_id FROM visit_inclusion_criteria_1)
EXCEPT DISTINCT
SELECT person_id, code_set FROM dx_case_exclusion_criteria_1)
;
Do you approve? You will notice it has changed slightly from before. I re-read the phenotype description from eMERGE (see below) and realized I had made a mistake in my first pass. With the changes shown above, I feel like it should now be good to go!
TO DO:
@mgkahn - Can you please help me verify the query to select Sickle Cell Disease patients?
COHORT CRITERIA Case Criteria:
visit_concept_id = 9202
)visit_concept_id = 9203
)place_of_service_concept_id = 9201
)visit_concept_id = 9203
)place_of_service_concept_id = 9201
)visit_concept_id = 9202
)Control Criteria:
Cohort Logic Table
Covariates
SQL Query