EHDEN / ETL-UK-Biobank

ETL UK-Biobank
https://ehden.github.io/ETL-UK-Biobank/
12 stars 4 forks source link

Handle minus codes for gp covid tables #262

Closed MaximMoinat closed 3 years ago

MaximMoinat commented 3 years ago

The code columns in the following tables contain minus codes. These probably have a special meaning and we should determine whether these should be included in the mapping or skipped.

covid19_emis_gp_clinical_to_stem_table: -1, -2, -3, -4, -99 covid19_emis_gp_scripts_to_drug_exposure: -4 covid19_tpp_gp_scripts_to_drug_exposure: -1, -2

MaximMoinat commented 3 years ago

Documentation Covid19 GP tables: https://drive.google.com/file/d/11Y31B_ERqqgCECTimFOM-VFMSDl0G0Nd/view?usp=sharing

egarcialara commented 3 years ago

I searched for those codes in the original tables, from the synthetic data set. Do you refer to a different one?

covid19_emis_gp_clinical

Unique combinations of code_type and code:

code_type Meaning code_type code Meaning code Decision
-99 Redacted – missing -99   remove
-99 Redacted – missing -1   remove
-99 Redacted – incorrect READ2 -4   remove
-4 Redacted – incorrect READ2 -99   remove
-4 Redacted – incorrect READ2 -1   remove
-1 Redacted – potentially sensitive or identifying -99   remove
-1 Redacted – potentially sensitive or identifying -1   remove
-1 Redacted – potentially sensitive or identifying -4   remove
2 SNOMED CT -99   I'd say this combination doesn't exist in real data
2 SNOMED CT -1    "
2 SNOMED CT -4    "
3 Local EMIS code -99 Redacted – missing remove
3 Local EMIS code -1 Redacted – potentially sensitive or identifying remove
3 Local EMIS code -4 Redacted – incorrect READ2 remove
5 EMIS online test request code -99   I'd say this combination doesn't exist in real data
5 EMIS online test request code -1   "
5 EMIS online test request code -4   "

Source code_type: https://biobank.ndph.ox.ac.uk/showcase/coding.cgi?id=3175 Source code, Local EMIS code: https://biobank.ndph.ox.ac.uk/showcase/coding.cgi?id=7689

covid19_emis_gp_scripts

[none found]

covid19_tpp_gp_scripts

Coding Meaning Decision
-1 No dm+d code remove
-2 Mapped to multiple dm+d codes ???

Source: https://biobank.ndph.ox.ac.uk/showcase/coding.cgi?id=4214

MaximMoinat commented 3 years ago

Our suggestion is to remove the records with a minus code. @spiros Do you agree or is there an analytical reason to keep these records?

spiros commented 3 years ago

I've had a look at the raw data and while some of the combinations in the table above are used a significant amount of times, I also suggest we drop them as there's not much we can do even if we do keep them at this stage.