callahantiff / PheKnowVec

Translational Computational Phenotyping
2 stars 0 forks source link

Query Verification: Systemic Lupus Erythematosus Cohort #108

Closed callahantiff closed 5 years ago

callahantiff commented 5 years ago

@mgkahn - Can you please help me verify the query to select Systemic Lupus Erythematosus patients?


COHORT CRITERIA Case Criteria:


Control Criteria:



Cohort Logic Table

COHORT CHUNK LOGICAL OPERATOR
CASE dx_case_inclusion_criteria_1 AND
CASE mx_case_inclusion_criteria_1 AND
CASE rx_case_inclusion_criteria_1 AND
CASE dx_case_exclusion_criteria_1 ---

NOTE.

  1. Replace all instances of {database} with CHCO_DeID_Oct2018
  2. Comment out all instances of {code_set_group}

Query can be found here and is also included below:

WITH dx_case_inclusion_criteria_1 AS (
SELECT co.person_id, cohort.standard_code_set AS code_set
  FROM 
   {database}.condition_occurrence co, 
   {database}.SYSTEMICLUPUSERYTHEMATOSUS_COHORT_VARS cohort 
 WHERE co.condition_concept_id = CAST(cohort.standard_concept_id AS int64) 
   AND cohort.phenotype_definition_number = 1
   AND cohort.standard_code_set = {code_set_group} 
 GROUP BY
    co.person_id, code_set
 HAVING
   COUNT(DISTINCT co.condition_concept_id) >= 3
),

mx_case_inclusion_criteria_1 AS (
SELECT m.person_id, cohort.standard_code_set AS code_set 
FROM 
  {database}.measurement m, 
  {database}.SYSTEMICLUPUSERYTHEMATOSUS_COHORT_VARS cohort  
WHERE 
  m.measurement_concept_id = CAST(cohort.standard_concept_id AS int64) 
  AND cohort.phenotype_definition_number = 2 
  AND cohort.standard_code_set = {code_set_group}
  AND m.value_as_number > m.range_high
GROUP BY 
  m.person_id, cohort.standard_code_set
HAVING 
  COUNT(DISTINCT m.measurement_concept_id) >= 1
),

rx_case_inclusion_criteria_1 AS (
SELECT de.person_id, cohort.standard_code_set AS code_set 
FROM 
  {database}.drug_exposure de, 
  {database}.SYSTEMICLUPUSERYTHEMATOSUS_COHORT_VARS cohort  
WHERE 
  de.drug_concept_id = CAST(cohort.standard_concept_id AS int64) 
  AND cohort.phenotype_definition_number = 3 
  AND cohort.standard_code_set = {code_set_group} 
GROUP BY 
  de.person_id, cohort.standard_code_set
HAVING 
  COUNT(DISTINCT de.drug_concept_id) >= 1
),

dx_case_exclusion_criteria_1 AS (
  SELECT co.person_id, cohort.standard_code_set AS code_set
    FROM 
     {database}.condition_occurrence co, 
     {database}.SYSTEMICLUPUSERYTHEMATOSUS_COHORT_VARS cohort 
   WHERE co.condition_concept_id = CAST(cohort.standard_concept_id AS int64) 
     AND cohort.phenotype_definition_number = 4
     AND cohort.standard_code_set = {code_set_group} 
   GROUP BY
      co.person_id, code_set
   HAVING
     COUNT(DISTINCT co.condition_concept_id) >= 1
)

SELECT * FROM
  (SELECT person_id, code_set, 'SYSTEMICLUPUSERYTHEMATOSUS_CASE' AS cohort_type
    FROM (
    (SELECT person_id, code_set FROM dx_case_inclusion_criteria_1
      INTERSECT DISTINCT 
     SELECT person_id, code_set FROM mx_case_inclusion_criteria_1
      INTERSECT DISTINCT 
     SELECT person_id, code_set FROM rx_case_inclusion_criteria_1)
      EXCEPT DISTINCT
     SELECT person_id, code_set FROM dx_case_exclusion_criteria_1)
   GROUP BY person_id, code_set, cohort_type)
;
mgkahn commented 5 years ago

mx_case_inclusion_criteria says positive if titers > 1:40. SQL says positive if: m.value_as_number > m.range_high Have you looked at the values that come back from this CTE to see if all are numeric? Would not surprise me if some percentage were text, which you may need to ignore. Not sure what your range_high inequality would do with text values. Would it use lexicographic ordering for testing? You may need to do some type of check if numeric (maybe in a case statement) first before applying inequality.

mgkahn commented 5 years ago

rx_case_inclusion_criteria: I'm assuming SYSTEMICLUPUSERYTHEMATOSUS_COHORT_VARS is a massive list of medications that has been reviewed for appropriate inclusion (including only oral or injectables). It has to be a massive list.....

mgkahn commented 5 years ago

dx_case_exclusion_criteria_1: I read the text as saying only 1 exclusion DX is sufficient (unlike inclusion DX which requires > 3). If so, replace HAVING COUNT(DISTINCT co.condition_concept_id) >= 3 with HAVING COUNT(DISTINCT co.condition_concept_id) >= 1

(cut-paste error from dx_case_inclusion_criteria_1)

mgkahn commented 5 years ago

End of SLE comments

callahantiff commented 5 years ago

mx_case_inclusion_criteria says positive if titers > 1:40. SQL says positive if: m.value_as_number > m.range_high Have you looked at the values that come back from this CTE to see if all are numeric? Would not surprise me if some percentage were text, which you may need to ignore. Not sure what your range_high inequality would do with text values. Would it use lexicographic ordering for testing? You may need to do some type of check if numeric (maybe in a case statement) first before applying inequality.

Good point. What would a result of > 1:40 even look like? Here is an example of what is returned for a nuclear antibody titer:

LOINC LOINC Label Result Range High
29953-7 Nuclear Ab [Titer] in Serum 320.0 40

I am guessing this (although not the right lab) would meet criteria? And the same logic would be used for a low results since the criteria includes ‡1:40 or, perhaps I am interpreting that incorrectly?

callahantiff commented 5 years ago

rx_case_inclusion_criteria: I'm assuming SYSTEMICLUPUSERYTHEMATOSUS_COHORT_VARS is a massive list of medications that has been reviewed for appropriate inclusion (including only oral or injectables). It has to be a massive list.....

Yes, it is a massive list and is currently under review. We will have two versions of the code sets:

  1. The reviewed matches, which fall under "domain knowledge"
  2. The raw computer-automated mappings (n=23,816), which are unchecked.

Until the review of the codes is complete, I am focusing on the first round of results which just include the computer-automated mappings.

callahantiff commented 5 years ago

dx_case_exclusion_criteria_1: I read the text as saying only 1 exclusion DX is sufficient (unlike inclusion DX which requires > 3). If so, replace HAVING COUNT(DISTINCT co.condition_concept_id) >= 3 with HAVING COUNT(DISTINCT co.condition_concept_id) >= 1

(cut-paste error from dx_case_inclusion_criteria_1)

Agreed, that was a typo. Shouldn't it be > 1 rather than >=1 to since it's excluding any occurrence of the codes?

mgkahn commented 5 years ago

mx_case_inclusion_criteria says positive if titers > 1:40. SQL says positive if: m.value_as_number > m.range_high Have you looked at the values that come back from this CTE to see if all are numeric? Would not surprise me if some percentage were text, which you may need to ignore. Not sure what your range_high inequality would do with text values. Would it use lexicographic ordering for testing? You may need to do some type of check if numeric (maybe in a case statement) first before applying inequality.

Good point. What would a result of > 1:40 even look like? Here is an example of what is returned for a nuclear antibody titer:

LOINC LOINC Label Result Range High 29953-7 Nuclear Ab [Titer] in Serum 320.0 40 I am guessing this (although not the right lab) would meet criteria? And the same logic would be used for a low results since the criteria includes ‡1:40 or, perhaps I am interpreting that incorrectly?

I believe your example, since it says "titers", is "saying" 1:320, which is definitely a very high titer. In this example, 40 seems to also be the range_high for this lab test so it seems the query lines up with the criterion.

mgkahn commented 5 years ago

dx_case_exclusion_criteria_1: I read the text as saying only 1 exclusion DX is sufficient (unlike inclusion DX which requires > 3). If so, replace HAVING COUNT(DISTINCT co.condition_concept_id) >= 3 with HAVING COUNT(DISTINCT co.condition_concept_id) >= 1 (cut-paste error from dx_case_inclusion_criteria_1)

Agreed, that was a typo. Shouldn't it be > 1 rather than >=1 to since it's excluding any occurrence of the codes?

If even 1 dx is reason to exclude (this is how I read the English fragment), then >=1 , not >1 (which means two or more).

callahantiff commented 5 years ago

dx_case_exclusion_criteria_1: I read the text as saying only 1 exclusion DX is sufficient (unlike inclusion DX which requires > 3). If so, replace HAVING COUNT(DISTINCT co.condition_concept_id) >= 3 with HAVING COUNT(DISTINCT co.condition_concept_id) >= 1 (cut-paste error from dx_case_inclusion_criteria_1)

Agreed, that was a typo. Shouldn't it be > 1 rather than >=1 to since it's excluding any occurrence of the codes?

If even 1 dx is reason to exclude (this is how I read the English fragment), then >=1 , not >1 (which means two or more).

Agree completely!

callahantiff commented 5 years ago

mx_case_inclusion_criteria says positive if titers > 1:40. SQL says positive if: m.value_as_number > m.range_high Have you looked at the values that come back from this CTE to see if all are numeric? Would not surprise me if some percentage were text, which you may need to ignore. Not sure what your range_high inequality would do with text values. Would it use lexicographic ordering for testing? You may need to do some type of check if numeric (maybe in a case statement) first before applying inequality.

Good point. What would a result of > 1:40 even look like? Here is an example of what is returned for a nuclear antibody titer: LOINC LOINC Label Result Range High 29953-7 Nuclear Ab [Titer] in Serum 320.0 40 I am guessing this (although not the right lab) would meet criteria? And the same logic would be used for a low results since the criteria includes ‡1:40 or, perhaps I am interpreting that incorrectly?

I believe your example, since it says "titers", is "saying" 1:320, which is definitely a very high titer. In this example, 40 seems to also be the range_high for this lab test so it seems the query lines up with the criterion.

OK, that makes sense for titers, but we will also have mappings from the automated approach that are not limited to titers. That was ultimately why I went with the approach that I did. I'm not saying it's perfect, but I think it will work 90% of the time. That said, I really want to do a good job so if there is a different approach that you recommend that I take I'm totally happy to discuss!

mgkahn commented 5 years ago

dx_case_exclusion_criteria_1: I read the text as saying only 1 exclusion DX is sufficient (unlike inclusion DX which requires > 3). If so, replace HAVING COUNT(DISTINCT co.condition_concept_id) >= 3 with HAVING COUNT(DISTINCT co.condition_concept_id) >= 1 (cut-paste error from dx_case_inclusion_criteria_1)

Agreed, that was a typo. Shouldn't it be > 1 rather than >=1 to since it's excluding any occurrence of the codes?

No, I think it should be >=1 because the existence of even 1 code is an exclusion. ">1" would allow a single instance to still be acceptable, which I think isn't what is desired.

callahantiff commented 5 years ago

dx_case_exclusion_criteria_1: I read the text as saying only 1 exclusion DX is sufficient (unlike inclusion DX which requires > 3). If so, replace HAVING COUNT(DISTINCT co.condition_concept_id) >= 3 with HAVING COUNT(DISTINCT co.condition_concept_id) >= 1 (cut-paste error from dx_case_inclusion_criteria_1)

Agreed, that was a typo. Shouldn't it be > 1 rather than >=1 to since it's excluding any occurrence of the codes?

No, I think it should be >=1 because the existence of even 1 code is an exclusion. ">1" would allow a single instance to still be acceptable, which I think isn't what is desired.

I also agree with you, I updated the query above to reflect >=1.

callahantiff commented 5 years ago

Confirmed during meeting that this is completed!