National-COVID-Cohort-Collaborative / Data-Ingestion-and-Harmonization

Data Ingestion and Harmonization
41 stars 12 forks source link

Sites are sending in domain data associated with multiple person_id #75

Closed stephanieshong closed 2 years ago

stephanieshong commented 3 years ago

Sites are submitting domain data with visit_occurrence_id associated with multiple person_id

stephanieshong commented 3 years ago

For each domain, if there exists a visit_occurrence_id associated with more than 1 person_id, that visit_occurrence_id is flagged. Then, across all domains, rows with flagged visit_occurrence_ids are dropped

stephanieshong commented 2 years ago

If the visit_occurrence_id is associated with multiple person_id they are excluded since we have no way to guessing which patient the visit should be associated with.

stephanieshong commented 2 years ago

this is done in LDS cleaning step - def get_rows_with_multiple_persons_per_visit(domain_df): Return rows that have a visit_occurrence_id associated with more than 1 person_id