National-Clinical-Cohort-Collaborative / Data-Ingestion-and-Harmonization

Data Ingestion and Harmonization
41 stars 12 forks source link

CMS: Medicaid - 196 person have multiple race values and 692 multiple ethnicity values #114

Closed stephanieshong closed 1 year ago

stephanieshong commented 1 year ago

found 196 person with multiple race values in the source data - due to multiple race values,, it is causing duplicate primary key failure. The key columns are used to build the person's primary key value. We will need to null out these race values and use the cleaned race values to build the person dataset. The list can be found here: https://unite.nih.gov/workspace/data-integration/dataset/preview/ri.foundry.main.dataset.1029742f-d5b8-4363-8cea-03914508c7c9/sh%2Fmultiple_eth_value

stephanieshong commented 1 year ago

same column is used to specify both the race and ethnicity values. separate dataset with multiple race or ethncity or multiple values in both race and ethnicity is build to add these people with 0 as the race and 0 as the ethnicity.