Open mizq7 opened 4 weeks ago
@vasanthi014, we need to check if we have duplicate concepts stored per patient for demographic informations
@vasanthi014, Can we make sure we dont have any duplicate patient records in all tables/views? Significant number of duplications is causing the error
select person_id, count(*) cn from cdm.person group by person_id order by cn desc;
@vasanthi014, I think we should look into the implementation of OMOP_PCORNET_VALUESET_MAPPING
since we have duplicate source_concept_class, adding "gender_map.pcornet_table_name = 'DEMOGRAPHIC'" will solve the issue
on demographic.sex = gender_map.PCORNET_VALUESET_ITEM
and gender_map.source_concept_class = 'Gender'
and gender_map.pcornet_table_name = 'DEMOGRAPHIC' -- extra filtering
Duplication in Procedure_Occurence table,
select procedure_occurrence_id, count() from ATLAS_MU_DEV.CDM.PROCEDURE_OCCURRENCE group by procedure_occurrence_id order by count() desc
duplicate procedure exists in the CDM
select * FROM DEIDENTIFIED_PCORNET_CDM.CDM.deid_procedures procedures where proceduresid = 12245482
Duplication in the visit occurence table,
select visit_occurrence_id, count() from ATLAS_MU_DEV.CDM.VISIT_OCCURRENCE group by visit_occurrence_id order by count() desc limit 10;
Also, duplication exisits in the CDM
duplication in measurement table,
--- measurement select MEASUREMENT_ID, count() from ATLAS_MU_DEV.CDM.measurement group by MEASUREMENT_ID order by count() desc limit 10;
Duplication existis in the CDM lab_results_cdm
Error in Drug exposure,
--- drug exposure select DRUG_EXPOSURE_ID, count() from ATLAS_MU_DEV.CDM.DRUG_EXPOSURE group by DRUG_EXPOSURE_ID order by count() desc limit 10;
select * from ATLAS_MU_DEV.CDM.DRUG_EXPOSURE where DRUG_EXPOSURE_ID = 2086775;
select * from DEIDENTIFIED_PCORNET_CDM.CDM.deid_dispensing where dispensingid = 2086775;
duplication in death table in the OMOP, CDM looks fine
-- death select person_id, count() from ATLAS_MU_DEV.CDM.death group by person_id order by count() desc limit 10;
select * from ATLAS_MU_DEV.CDM.death where person_id = 540446;
select * from DEIDENTIFIED_PCORNET_CDM.CDM.deid_death where patid = 540446;
duplication in condition occurence table, CDM looks fine,
-- condition_occurence select condition_occurrence_id, count() from ATLAS_MU_DEV.CDM.condition_occurrence group by condition_occurrence_id order by count() desc limit 10;
select * from ATLAS_MU_DEV.CDM.condition_occurrence where condition_occurrence_id = 31769352;
select * from DEIDENTIFIED_PCORNET_CDM.CDM.deid_diagnosis where diagnosisid = 31769352;
condition_era and provider should be empty if there is no mapping
@vasanthi014, Above I listed all the view and table names and their corresponding issues that needs to be fixed
@vasanthi014, Above I listed all the view and table names and their corresponding issues that needs to be fixed
Thank you @shossain-mizzou I will take a look at these issues in the data.
@vasanthi014 @shossain-mizzou Atlas' Demographics characterization is consistently calculating incorrect total percentages for subgroups such as SEX, including categories like male, female, No Information, and Unknown. These totals are erroneously exceeding the expected 100%. This issue persists across both complex queries involving over 1800 ICD/CPD codes and simpler queries with a single code. This anomaly indicates a potential systemic error in the percentage calculation or data aggregation process.
Action Item: Investigate and resolve the issue of inflated percentages in the Demographics characterization on the Atlas platform. Ensure the total for all subgroups accurately sums to 100%.
I would appreciate your quick suggestions. Please let me know if you have any question on this.