National-COVID-Cohort-Collaborative / Data-Ingestion-and-Harmonization

Data Ingestion and Harmonization
41 stars 12 forks source link

Alpha characters in TriNetX pat_IDs and encounter_IDs #49

Closed DaveraGabriel closed 2 years ago

DaveraGabriel commented 4 years ago

Need to resolve the difference in data types. OMOP data type for visit_occurrence.visit_occurrence_id and measurement.visit_occurrence_id is INTEGER. per mapping validation session "...for a fair number of data providers, there are alpha characters in patient_id and encounter_id. Need to have MP, KK, CB on call to resolve this and look at the some sample encounter_ids."

DaveraGabriel commented 4 years ago

From Matvey Palchuk: @Smita Hastak I surveyed encounter identifiers for CTSA sites on TriNetX network. 4 of them have IDs with non-numeric (defined as not in [0-9]) characters. 3 of those 4 have some IDs with “-” preceding a numeric string, and one has “-1" for some encounter IDs. Caveat - there’s no guarantee that any of these 4 sites will decide to use TriNetX for submission to N3C, so you might not encounter any of these at all. But in case we do, I would suggest replacing “-” by some number (to preserve uniqueness of the ID) - say, a zero - and that should take care of this issue.

stephanieshong commented 2 years ago

All domain id are regenerated within N3C. Uniqueness of IDs are checked.