EHDEN / ETL-UK-Biobank

ETL UK-Biobank
https://ehden.github.io/ETL-UK-Biobank/
12 stars 4 forks source link

Add mapping table for Read 2 DRUG codes found in gp_prescriptions #108

Closed alepev closed 3 years ago

alepev commented 3 years ago

Read v2 drug codes are not available in Athena nor NHS tables. We should obtain this information from another source (note: for this issue help expected from the collaborators).

Ideally map to a valid Athena ontology and retrieve standard concept_id for those equivalent terms; the mapping to standard concept_id could be done programmatically during ETL execution using queries to the vocabulary tables, rather than being stored in the mapping table themselves.

MaximMoinat commented 3 years ago

Possibility: use existing dm+d mapping as intermediate.

alepev commented 3 years ago

We managed to match all Read v2 drug codes in the ScanReport with a description extracted from UKB Biobank itself, and let Usagi perform automated mappings to RxNorm and RxNorm extended. We need to manually review this, but the impression is that the mappings are generally accurate.

alepev commented 3 years ago

NOTE: the mapping only includes the 566 Read codes seen in the ScanReport for gp_prescriptions - read_2 column (cutoff: 5 occurrences). Some of these codes are redundant, i.e. seen both with/without trailing zeroes (e.g. blc1.00 / blc1.), so the number of mapped unique codes is actually lower. We could map more to standard OMOP concept_ids as needed, but not the whole original UKB file containing the code descriptions, since it has 67612 entries.