OHDSI / Vocabulary-v5.0

Build process for the OHDSI Standardized Vocabularies. Currently not available as independent release.
The Unlicense
222 stars 75 forks source link

Changing Domains #130

Closed ericaVoss closed 6 years ago

ericaVoss commented 7 years ago

Looks like about 1,198 standard concepts changed domains between Vocab 20161231 to Vocab 20170503. Most look good, however a few look incorrect:

CONCEPT_ID  CONCEPT_NAME    CHANGE
4046886     Smoking reduced Observation --> Condition
4308357     Goal not achieved   Observation --> Condition
44807379    Complies with dietary regimen   Observation --> Condition
44811248    Goal achievement status unknown Observation --> Condition
40486696    Smoked before confirmation of pregnancy Observation --> Condition
4308000     Noncompliance with infection control regimen    Observation --> Condition
4052030     Moderate cigarette smoker   Observation --> Condition
46273081    No known exposure to tobacco smoke  Observation --> Condition
4190830     Poor compliance with inhaler    Observation --> Condition
45770364    Inappropriate use of walk-in centre Observation --> Condition
37016593    Non-compliant with peritoneal dialysis prescription Observation --> Condition
4058137     Trying to give up smoking   Observation --> Condition
4246988     Compliance issues discussed with patient    Observation --> Condition
4222303     Non-smoker  Observation --> Condition
4265571     Finding related to ability to comply with treatment Observation --> Condition
4092281     Ex-cigarette smoker Observation --> Condition
4043060     Chews twist tobacco Observation --> Condition
4276526     Cigarette smoker    Observation --> Condition
4032524     Noncompliance with therapeutic regimen  Observation --> Condition
45773499    Appropriate use of walk-in centre   Observation --> Condition
46270993    Complies with therapeutic regimen   Observation --> Condition

This could be related to #96. Maybe in OHDSI we have decided smoking is a condition?

I didn't capture everything, most of them looked related to smoking or treatment compliance.

Just a simple query to review

SELECT '9.18'   AS TEST_ID,
    'Consistency Check - Changing Domains'  AS TEST_DESCRIPTION,
    'CONCEPT'   AS VOCAB_TABLE,
     CAST(c1.CONCEPT_ID AS VARCHAR(50)) AS ID,
    'CONCEPT_ID' AS ID_NAME,
    'CONCEPT_NAME <> CURRENT <> PREVIOUS'               AS EVIDENCE_COLUMNS,
     CAST(c1.CONCEPT_NAME AS VARCHAR(50)) + ' <> ' + CAST(c1.DOMAIN_ID AS VARCHAR(50)) + ' <> ' + CAST(c2.DOMAIN_ID AS VARCHAR(50)) AS EVIDENCE
FROM CONCEPT c1
    JOIN OMOP_VOCABULARY_20161218.dbo.CONCEPT c2
        ON c2.CONCEPT_ID = c1.CONCEPT_ID
        AND c2.DOMAIN_ID != c1.domain_id
WHERE c1.STANDARD_CONCEPT = 'S'
ORDER BY c1.DOMAIN_ID, c2.CONCEPT_ID
dimshitc commented 7 years ago

Yeah, SNOMED changed it's hierarchy so now we have 1. blood disorders now have ancestor: 441742003 Evaluation finding (finding) so to avoid this we need to add condition "64572001 disease" to make them diseases ;

  1. (363787002, 'Observation'); -- Observable entity now has more children then it should be and they made it really awfull, look: http://ontoserver.csiro.au/shrimp/?concept=364075005&system=http://snomed.info/sct&versionId=http://snomed.info/sct/900000000000207008/version/20170131

  2. old SNOMED had "vital signs" as a ancestor, http://ontoserver.csiro.au/shrimp/?concept=46680005&system=http://snomed.info/sct&versionId=http://snomed.info/sct/900000000000207008/version/20160731 but now it's absent there.

"Normal urine color" - now is Clinical Finding was - "Evaluation finding",

1st case is OK and easy fixable but 2nd and 3d looks confusing it looks more as a SNOMED bug not our. but anyway, in the next release (for about month from now we'll fix this)

PRijnbeek commented 7 years ago

We also see many domain changes from Procedures to Observations for example:

4302396 Telephone Consultation 4295053 Diabetic Care etc 4259432 Home Visit ... many more

I guess the reason is a change in SNOMED? What I do not understand is why the Concept Class is still Procedure and Domain Observation?

cgreich commented 7 years ago

Actually, these are not procedures, but Services. Those should be Observations. But we will review the hierarchical logic, @PRijnbeek

ericaVoss commented 7 years ago

@clairblacketer and I have just been thinking about this a bit more. The most notable change is that Anemia and their children are moving from CONDITION_OCCURRENCE to MEASUREMENT. This seems to have the biggest impact on our data. I've attached a XLS that shows you which concepts have the most rows in a large claims database.

PREVALANT CONCEPT DOMAIN CHANGES.xlsx

WITH CTE_CHANGE AS (
    SELECT '9.18'   AS TEST_ID,
        'Consistency Check - Changing Domains'  AS TEST_DESCRIPTION,
        'CONCEPT'   AS VOCAB_TABLE,
         c1.CONCEPT_ID AS ID,
        'CONCEPT_ID' AS ID_NAME,
        'CONCEPT_NAME <> CURRENT <> PREVIOUS'               AS EVIDENCE_COLUMNS,
         CAST(c1.CONCEPT_NAME AS VARCHAR(50)) + ' <> ' + CAST(c1.DOMAIN_ID AS VARCHAR(50)) + ' <> ' + CAST(c2.DOMAIN_ID AS VARCHAR(50)) AS EVIDENCE
    FROM VOCABULARY_20161218.dbo.CONCEPT c1
        JOIN VOCABULARY_20170503.dbo.CONCEPT c2
            ON c2.CONCEPT_ID = c1.CONCEPT_ID
            AND c2.DOMAIN_ID != c1.domain_id
    WHERE c1.STANDARD_CONCEPT = 'S'
), 
CTE_CURRENT_PLACE AS (
    SELECT CONDITION_CONCEPT_ID AS CONCEPT_ID, COUNT(*) AS ROW_COUNT
    FROM CONDITION_OCCURRENCE
    WHERE CONDITION_CONCEPT_ID IN (
        SELECT DISTINCT ID FROM CTE_CHANGE
    )
    GROUP BY CONDITION_CONCEPT_ID
    UNION
    SELECT PROCEDURE_CONCEPT_ID AS CONCEPT_ID, COUNT(*) AS ROW_COUNT
    FROM PROCEDURE_OCCURRENCE
    WHERE PROCEDURE_CONCEPT_ID IN (
        SELECT DISTINCT ID FROM CTE_CHANGE
    )
    GROUP BY PROCEDURE_CONCEPT_ID
    UNION
    SELECT OBSERVATION_CONCEPT_ID AS CONCEPT_ID, COUNT(*) AS ROW_COUNT
    FROM OBSERVATION
    WHERE OBSERVATION_CONCEPT_ID IN (
        SELECT DISTINCT ID FROM CTE_CHANGE
    )
    GROUP BY OBSERVATION_CONCEPT_ID
    UNION
    SELECT OBSERVATION_CONCEPT_ID AS CONCEPT_ID, COUNT(*) AS ROW_COUNT
    FROM OBSERVATION
    WHERE OBSERVATION_CONCEPT_ID IN (
        SELECT DISTINCT ID FROM CTE_CHANGE
    )
    GROUP BY OBSERVATION_CONCEPT_ID
    UNION
    SELECT MEASUREMENT_CONCEPT_ID AS CONCEPT_ID, COUNT(*) AS ROW_COUNT
    FROM MEASUREMENT
    WHERE MEASUREMENT_CONCEPT_ID IN (
        SELECT DISTINCT ID FROM CTE_CHANGE
    )
    GROUP BY MEASUREMENT_CONCEPT_ID
    UNION
    SELECT DRUG_CONCEPT_ID AS CONCEPT_ID, COUNT(*) AS ROW_COUNT
    FROM DRUG_EXPOSURE
    WHERE DRUG_CONCEPT_ID IN (
        SELECT DISTINCT ID FROM CTE_CHANGE
    )
    GROUP BY DRUG_CONCEPT_ID
)
SELECT c.ID,
    c.EVIDENCE_COLUMNS,
    c.EVIDENCE, 
    CASE WHEN cp.ROW_COUNT IS NULL THEN 0 ELSE cp.ROW_COUNT END AS ROW_COUNT
FROM CTE_CHANGE c
    LEFT OUTER JOIN CTE_CURRENT_PLACE cp
        ON cp.CONCEPT_ID = c.ID
ORDER BY ROW_COUNT DESC
dimshitc commented 6 years ago

Well, in a current version of the vocabulary all the concepts you mentioned belong to the Observation domain. Now we added manual review of domain changes after every SNOMED update as routine to avoid unexpected domain changes