Closed callahantiff closed 5 years ago
The query above is correct for ETL. It may not be what you want to use to obtain standard concept_ids from ICD9CM codes in a query. In particular, on the query side, the very last clause in your WHERE statement: c1.invalid_reason = "" will return only those standard concept_ids that are valid on the day the query is run. But, if you are querying OMOP data that is ETL'd incrementally over time rather than as a full-dump (HDC uses incremental ETLs), then there will be data from older ETL cycles that are mapped to standard concept_ids that were valid then but are invalid now. The SQL above will not return currently invalid concept_ids.
In your example, removing the last clause did NOT change the results. But with medications, which change frequently, I suspect the results would vary over time/ETL cycles.
The query above is correct for ETL. It may not be what you want to use to obtain standard concept_ids from ICD9CM codes in a query. In particular, on the query side, the very last clause in your WHERE statement: c1.invalid_reason = "" will return only those standard concept_ids that are valid on the day the query is run. But, if you are querying OMOP data that is ETL'd incrementally over time rather than as a full-dump (HDC uses incremental ETLs), then there will be data from older ETL cycles that are mapped to standard concept_ids that were valid then but are invalid now. The SQL above will not return currently invalid concept_ids.
In your example, removing the last clause did NOT change the results. But with medications, which change frequently, I suspect the results would vary over time/ETL cycles.
Response:
Updated query shown for consistency:
SELECT DISTINCT
c.concept_code AS source_code,
c.concept_name AS source_name,
c.domain_id AS source_domain_id,
c.vocabulary_id AS source_vocabulary,
c1.concept_code AS standard_code,
c1.concept_name AS standard_name,
c1.domain_id AS standard_domain_id,
c1.vocabulary_id AS standard_vocabulary,
c1.standard_concept, c1.invalid_reason
FROM CHCO_DeID_Oct2018.concept c
JOIN CHCO_DeID_Oct2018.concept_relationship r
ON r.concept_id_1 = c.concept_id
JOIN CHCO_DeID_Oct2018.concept c1
ON r.concept_id_2 = c1.concept_id
WHERE c.concept_code IN ("314.01", "296")
AND c.vocabulary_id = "ICD9CM"
AND r.relationship_id = "Maps to"
AND c1.vocabulary_id = "SNOMED"
AND c1.standard_concept = "S";
Just highlighting that a query for ETL may not be what you want for a query. For ETL, the restriction to only valid concepts makes sense. But for a query against longitudinal data that is added incrementally, you need to consider how to handle deprecated concept_ids that will persist in the data warehouse. That's the point I was making. I would include the restriction for ETL processing and not include the restriction for queries.
OK, thank you for explaining that! I think we are good on this so I will close the issue.
@mgkahn - Can you please verify the following query is correct?
Purpose: This query is designed to find exact matches between source vocabulary code strings (e.g. ICD9CM 428) and an OMOP standard terminology concept code.
Result: