FINNGEN / ETL

second version for the ETL of the detailed longitudinal data to the OMOP-CDM
https://finngen.github.io/ETL/
MIT License
0 stars 0 forks source link

completenes errors in DQD that may show some other issues #106

Closed javier-gracia-tabuenca-tuni closed 1 year ago

javier-gracia-tabuenca-tuni commented 1 year ago
shanmukhasampath commented 1 year ago

PROCEDURE_OCCURRENCE

There are 343,022 out of 646,260 events have procedure_concept_id as 0 image

For the 343,022 events with procedure_concept_id as 0, there are 3,161 procedure codes.

Out of these 3,161 procedure codes there are 2,288 procedure codes have omop_concept_id but none of them map to a standard procedure concept id.

WITH temp AS (
SELECT FG_CODE1, code, vocabulary_id, omop_concept_id
FROM medical_codes.fg_codes_info_v3
WHERE CAST(omop_concept_id AS INT64) IN ( select DISTINCT cdmtable.procedure_source_concept_id
        from atlas-development-270609.etl_sam_dev_omop.procedure_occurrence cdmtable
        where cdmtable.procedure_concept_id = 0  AND cdmtable.procedure_source_concept_id != 0 )
)

SELECT t.*, cr.concept_id_2, c.* EXCEPT(concept_id)
FROM temp AS t
JOIN `etl_sam_dev_omop.concept_relationship` AS cr
ON cr.concept_id_1 = CAST(t.omop_concept_id AS INT64)
JOIN `etl_sam_dev_omop.concept` AS c
ON c.concept_id = cr.concept_id_2
ORDER BY t.FG_CODE1

These 2,288 procedure codes account for 87% of 343,022 events with procedure_concept_id as 0.

Rest of 873 procedure codes do not even have omop_concept_id

shanmukhasampath commented 1 year ago

VISIT_OCCURRENCE

All the visit_concept_id having 0 belong to DEATH registry. The DEATH non-standard omop_concept_id is 2002330104 which maps to standard concept id 0 which is shown below

image

This is expected.

javier-gracia-tabuenca-tuni commented 1 year ago

PROCEDURE_OCCURRENCE

There are 343,022 out of 646,260 events have procedure_concept_id as 0 image

For the 343,022 events with procedure_concept_id as 0, there are 3,161 procedure codes.

Out of these 3,161 procedure codes there are 2,288 procedure codes have omop_concept_id but none of them map to a standard procedure concept id.

WITH temp AS (
SELECT FG_CODE1, code, vocabulary_id, omop_concept_id
FROM medical_codes.fg_codes_info_v3
WHERE CAST(omop_concept_id AS INT64) IN ( select DISTINCT cdmtable.procedure_source_concept_id
      from atlas-development-270609.etl_sam_dev_omop.procedure_occurrence cdmtable
      where cdmtable.procedure_concept_id = 0  AND cdmtable.procedure_source_concept_id != 0 )
)

SELECT t.*, cr.concept_id_2, c.* EXCEPT(concept_id)
FROM temp AS t
JOIN `etl_sam_dev_omop.concept_relationship` AS cr
ON cr.concept_id_1 = CAST(t.omop_concept_id AS INT64)
JOIN `etl_sam_dev_omop.concept` AS c
ON c.concept_id = cr.concept_id_2
ORDER BY t.FG_CODE1

These 2,288 procedure codes account for 87% of 343,022 events with procedure_concept_id as 0.

Rest of 873 procedure codes do not even have omop_concept_id

Ok i found why, This is bcs the SPAT vocabulary codes are procedures and non of them are mapped. ~ half of the procedures events are SPAT