Closed MaximMoinat closed 3 years ago
Counting all the source codes for target concept 37393850 gives:
The code 1022481000000109
from covid19 gp_emis is the SNOMED code of "MCHC - Mean corpuscular haemoglobin concentration". The ETL maps 1.5M records of this code to the measurement table. To be investigated whether this correct, a source data or an ETL issue.
Checking the covid19 gp_emis source confirmed that indeed almost half the records are indeed have this code for MCHC.
The file has 3,304,808 lines in total from which 1,553,334 contains code '1022481000000109'.
So the OMOP mapping is correct, it reflects the source data. The ETL even filters out some of the records.
This high frequency might be a source data quality issue.
In the measurement table, the concept MCHC - Mean corpuscular haemoglobin concentration (37393850) occurs 1.5M times, more than ten times the second most occurring concept (Plasma HDL level).
We should research what source codes and fields maps to this concept_id.