Missouri-BMI / OMOP_ON_P_CDM

Apache License 2.0
0 stars 0 forks source link

ATLAS ERROR: Incorrect Percentage Calculation in Demographics Characterization #85

Open mizq7 opened 4 weeks ago

mizq7 commented 4 weeks ago

@vasanthi014 @shossain-mizzou Atlas' Demographics characterization is consistently calculating incorrect total percentages for subgroups such as SEX, including categories like male, female, No Information, and Unknown. These totals are erroneously exceeding the expected 100%. This issue persists across both complex queries involving over 1800 ICD/CPD codes and simpler queries with a single code. This anomaly indicates a potential systemic error in the percentage calculation or data aggregation process.

image image

Screenshot 2024-08-22 at 10 35 04 AM

Screenshot 2024-08-22 at 10 35 14 AM

Screenshot 2024-08-22 at 10 35 22 AM

Action Item: Investigate and resolve the issue of inflated percentages in the Demographics characterization on the Atlas platform. Ensure the total for all subgroups accurately sums to 100%.

I would appreciate your quick suggestions. Please let me know if you have any question on this.

shossain-mizzou commented 3 days ago

@vasanthi014, we need to check if we have duplicate concepts stored per patient for demographic informations

shossain-mizzou commented 2 days ago

@vasanthi014, Can we make sure we dont have any duplicate patient records in all tables/views? Significant number of duplications is causing the error

select person_id, count(*) cn from cdm.person group by person_id order by cn desc;

shossain-mizzou commented 2 days ago

@vasanthi014, I think we should look into the implementation of OMOP_PCORNET_VALUESET_MAPPING Image

shossain-mizzou commented 2 days ago

since we have duplicate source_concept_class, adding "gender_map.pcornet_table_name = 'DEMOGRAPHIC'" will solve the issue

 on demographic.sex = gender_map.PCORNET_VALUESET_ITEM
         and gender_map.source_concept_class = 'Gender'
         and gender_map.pcornet_table_name = 'DEMOGRAPHIC' -- extra filtering 
shossain-mizzou commented 2 days ago

Duplication in Procedure_Occurence table,

select procedure_occurrence_id, count() from ATLAS_MU_DEV.CDM.PROCEDURE_OCCURRENCE group by procedure_occurrence_id order by count() desc

shossain-mizzou commented 2 days ago

duplicate procedure exists in the CDM

select * FROM DEIDENTIFIED_PCORNET_CDM.CDM.deid_procedures procedures where proceduresid = 12245482

shossain-mizzou commented 2 days ago

Duplication in the visit occurence table,

select visit_occurrence_id, count() from ATLAS_MU_DEV.CDM.VISIT_OCCURRENCE group by visit_occurrence_id order by count() desc limit 10;

Also, duplication exisits in the CDM

shossain-mizzou commented 2 days ago

duplication in measurement table,

--- measurement select MEASUREMENT_ID, count() from ATLAS_MU_DEV.CDM.measurement group by MEASUREMENT_ID order by count() desc limit 10;

Duplication existis in the CDM lab_results_cdm

shossain-mizzou commented 2 days ago

Error in Drug exposure,

--- drug exposure select DRUG_EXPOSURE_ID, count() from ATLAS_MU_DEV.CDM.DRUG_EXPOSURE group by DRUG_EXPOSURE_ID order by count() desc limit 10;

select * from ATLAS_MU_DEV.CDM.DRUG_EXPOSURE where DRUG_EXPOSURE_ID = 2086775;

select * from DEIDENTIFIED_PCORNET_CDM.CDM.deid_dispensing where dispensingid = 2086775;

single dispensing ID getting multiple patient in the OMOP. Which is definitely an Error. CDM looks fine

shossain-mizzou commented 2 days ago

Image

shossain-mizzou commented 2 days ago

duplication in death table in the OMOP, CDM looks fine

-- death select person_id, count() from ATLAS_MU_DEV.CDM.death group by person_id order by count() desc limit 10;

select * from ATLAS_MU_DEV.CDM.death where person_id = 540446;

select * from DEIDENTIFIED_PCORNET_CDM.CDM.deid_death where patid = 540446;

shossain-mizzou commented 2 days ago

duplication in condition occurence table, CDM looks fine,

-- condition_occurence select condition_occurrence_id, count() from ATLAS_MU_DEV.CDM.condition_occurrence group by condition_occurrence_id order by count() desc limit 10;

select * from ATLAS_MU_DEV.CDM.condition_occurrence where condition_occurrence_id = 31769352;

select * from DEIDENTIFIED_PCORNET_CDM.CDM.deid_diagnosis where diagnosisid = 31769352;

shossain-mizzou commented 2 days ago

condition_era and provider should be empty if there is no mappingImage

shossain-mizzou commented 2 days ago

@vasanthi014, Above I listed all the view and table names and their corresponding issues that needs to be fixed

vasanthi014 commented 2 days ago

@vasanthi014, Above I listed all the view and table names and their corresponding issues that needs to be fixed

Thank you @shossain-mizzou I will take a look at these issues in the data.