EHDEN / ETL-UK-Biobank

ETL UK-Biobank
https://ehden.github.io/ETL-UK-Biobank/
12 stars 4 forks source link

Do not ignore EMIS records with value -9999999 or -9000099 #339

Closed MaximMoinat closed 2 years ago

MaximMoinat commented 2 years ago

For Deep Vein Thrombosis a lot of records were missing, Vaclav found the following:

When I search for 4133004 Concept in OMOP, it seems there are just two records from EMIS clinical table. However, when I search for 128053003 SNOMED code (which btw. is nowhere in the CTV3-SNOMED mapping) in EMIS table, I have 12,5k records. When I checked few records from both EMIS and TPP tables, all values are 0 and NULL for TPP and -9999999 for EMIS. Just two records from EMIS are different, these two are transformed to OMOP.

For TPP, there is no filter on the value. So this should work (if the CTV3 mapping to SNOMED is fixed).

For EMIS, we filter any records that have a minus value. This is too stringent.

We have to investigate the different EMIS administrative codes. The following are found in the scan report: -9000004, -9000003, -9000002, -9000001, -9999999, -9000099

MaximMoinat commented 2 years ago

Meaning of -9xxxxx codes:

coding meaning
-9999999 empty or non-numeric value/unit
-9000099 associated with empty or 0 clinical code
-9000004 redacted - incorrect READ2
-9000003 redacted - incorrect SNOMED
-9000002 redacted - rare occupation
-9000001 redacted - potentially sensitive or identifying

Decision: only filter out 9xxx1/2/3/4.