National-COVID-Cohort-Collaborative / Data-Ingestion-and-Harmonization

Data Ingestion and Harmonization
41 stars 12 forks source link

crosswalks impacted by possible bad relationships in OMOP concept_ancestor table. #120

Open chrisroederucdenver opened 11 months ago

chrisroederucdenver commented 11 months ago

In the context of having too many visits (more than 1 per day on average), we have been merging visits divided by monthly bills. This ticket addresses some 1-to-many mappings that may cause too many visits, but we should re-evaluate the heuristic.

I ran into discrepancies between row_counts and counts of distinct natural keys in the visit_occurrence table, and I think this goes back to issues in the OMOP concept_ancestor table. So that needs entered against OMOP, and we need to consider solutions or tests until that's fixed.

COUNTS

When creating the visit_occurrence table by rolling up lt_visit_detail rows, I count resulting rows and compare that to the number of unique combinations of (person_id, care_site_id, provider_id, visit_start_date, visit_end_date, visit_concept_id). They are different by close to a factor of 2 here. The latter is what I consider the number of entities in that table. While we have a surrogate primary key, it contains more fields and can hide issues.

CAUSE

querying for count(distinct X) where X is a list of fields that should be unique for each entity, I found that some visits showed up as both in-patient and out-patient. Digging further, I looked that the join with the cms_medicaid_place_of_visit_xwalk had multiple entries for AO, Hospital. One to in-patient and one to out-patient. Still deeper, it looks like it comes from data in the OMOP concept_ancestor table.

Have a look:

FIX


stephanieshong commented 10 months ago

Should we just update the A0 value ourselves so that the change is limited to LT