Closed razekmh closed 1 month ago
R Script has been developed that maps drug_exposure
to RAMSES drug_prescriptions
. Synthetic Patient Data in OMOP has been used as OMOP public dataset.
The following steps provide instructions on accessing chunks from Drug_exposure
and Concept
tables required for the mapping process into RAMSES drug_prescriptions
:
1- Load required libraries:
library(bigrquery)
library(DBI)
library(gargle)
2- Authenticate connection to BigQuery
sa_key_path <- ("path/to/your/key_file")
bq_auth(path = sa_key_path)
project_id <- “your_project_id”
con <- dbConnect(bigquery(), project = project_id, dataset = “bigquery-public-data”)
3- SQL queries for retrieving data from drug_exposure and concept tables:
sql_string1 <- “SELECT * FROM bigquery-public-data.cms_synthetic_patient_data_omop.drug_exposure LIMIT 10000”
sql_string2 <- "SELECT * FROM bigquery-public-data.cms_synthetic_patient_data_omop.concept WHERE domain_id IN ('Drug', 'Route', 'Unit')"
result1 <- dbGetQuery(con, sql_string1)
result2 <- dbGetQuery(con, sql_string2)
4- Save data
write.table(result1, file = "path/to/your/drug_exposure_file.csv")
write.table(result2, file = "path/to/your/concept_file.csv")
Note: Concept table has been filtered on domain_id = {DRUG, ROUTE, and UNIT}.
The developed script performs the mapping and transformation of drug exposure data from the OMOP format into a RAMSES-compatible format. Let's go through the code block by block:
# Load necessary libraries
library(dplyr)
library(readr)
library(AMR) # For drug name mapping
dplyr
: For data manipulation.readr
: To read and write CSV files.AMR
: For mapping drug names to standardised names using antimicrobial resistance data.# Load data from CSV files
drug_exposure <- read_csv("/path/to/cleaned_drug_exposure.csv")
concept <- read_csv("path/to/cleaned_concept.csv")
drug_exposure.csv
: Contains information about patient drug exposure events.concept.csv
: Contains concept information, including drug names, related to the drugs prescribed.# Print the column names to check if they are correct
print(colnames(concept))
# Print the cleaned data to verify everything looks correct
print(head(concept))
# Ensure the relevant columns are available in drug_exposure
print(colnames(drug_exposure))
print(head(drug_exposure))
concept
and drug_exposure
datasets to ensure they are correct.#Mapping using left join
drug_exposure <- drug_exposure %>%
left_join(concept, by = c("drug_concept_id" = "concept_id")) %>%
rename(drug_name = concept_name) %>% # Rename concept_name to drug_name
mutate(route = NA) # Since route_concept_id is NA.
drug_exposure
table with the concept
table based on matching drug_concept_id
and concept_id
. After joining, it renames the concept_name
column to drug_name
for clarity and initialises a route
column with NA
since the route_concept_id
is missing in this dataset.# Remove unnecessary columns from concept table
drug_exposure <- drug_exposure %>%
select(-domain_id, -vocabulary_id, -concept_class_id, -standard_concept,
-concept_code, -valid_start_date, -valid_end_date, -invalid_reason)
domain_id
, vocabulary_id
) are metadata and are not needed for further analysis or mapping.dose_unit_source_value
as Units# Map 'dose_unit_source_value' directly as units (since 'dose_unit_concept_id' is missing)
if ("dose_unit_source_value" %in% colnames(drug_exposure)) {
drug_exposure <- drug_exposure %>%
mutate(units = dose_unit_source_value) # Map 'dose_unit_source_value' to units
} else {
drug_exposure$units <- NA # If 'dose_unit_source_value' is missing, set units as NA
}
dose_unit_source_value
exists. If it does, it creates a units
column based on its values. If the column is missing, it sets units
to NA
. This is useful for ensuring dose units are captured or handled appropriately.# Map drug_exposure fields to RAMSES fields
omop_to_ramses <- drug_exposure %>%
transmute(
# Mapping OMOP person_id to RAMSES patient_id
patient_id = person_id,
# Mapping OMOP drug_exposure_id to RAMSES prescription_id
prescription_id = drug_exposure_id,
# Start and end dates of drug exposure
prescription_start = drug_exposure_start_date,
prescription_end = drug_exposure_end_date,
# Mapping drug_concept_id to RAMSES tr_DESC (drug description) using AMR package
tr_DESC = ifelse(!is.na(AMR::ab_name(drug_name)), AMR::ab_name(drug_name), "Unknown drug"),
# Route (e.g., IV, Oral) from concept table
route = route,
# Using 'quantity' as a proxy for dose if 'dose_value' is not available
dose = quantity,
# Units mapped from 'dose_unit_source_value'
units = units,
# Calculate duration between start and end dates in days
duration_days = as.numeric(difftime(prescription_end, prescription_start, units = "days"))
)
drug_exposure
table to a format compatible with the RAMSES model:
person_id
from OMOP is mapped to patient_id
in RAMSES.drug_exposure_id
from OMOP is mapped to prescription_id
.drug_name
column is mapped to a drug description using the AMR package. If the drug name is missing, it is labeled as "Unknown drug"
.NA
) is included.units
column is included.# Display the final mapped data from OMOP to RAMSES
print(omop_to_ramses)
# Validation function for checking mappings
validate_mapping <- function(df) {
if (all(!is.na(df$tr_DESC))) {
message("All drugs successfully mapped to RAMSES fields!")
} else {
message("Some drug mappings failed. Please check the following:")
print(df %>% filter(is.na(tr_DESC)))
}
}
# Run the validation function
validate_mapping(omop_to_ramses)
tr_DESC
field (i.e., no NA
values in the tr_DESC
column). If all drugs were mapped correctly, a success message is printed. If not, it identifies and prints the rows where drug mapping failed.# save the final mapped data to a CSV file
write_csv(omop_to_ramses, "./path/to/mapped_drug_prescriptions.csv")
The code starts by loading necessary libraries and data, verifying the structure of the data, and then transforming it from OMOP format to RAMSES format using joins, renaming, and custom mappings. It also includes validation to ensure successful mapping, and finally, it saves the transformed data.
The key aspects of the transformation involve:
concept
table.AMR
package.During the process of mapping OMOP drug exposure data to RAMSES, several issues were encountered that led to incomplete or missing mappings. Notably, some drug standards were not mapped correctly, resulting in entries being labelled as "Unknown drug" in the final dataset. This was primarily due to:
drug_concept_id
mappings.AMR
package for drug name resolution, which may not cover all drugs in the dataset, especially those that are non-antimicrobial.route_concept_id
, dose_unit_concept_id
, and other essential columns that could have provided more complete data for fields like route, dose, and units.Well done @zsenousy. This is great work. Would it be okay to push your code to the branch and resolve this issue. I think #27 could use a lot of the functions you built for this.
Pull request has been created for this code addition.
Extract data from the example OMOP data to fill the
drug_prescriptions
table from the validate article. This is a split from #17.Please feel free to assign yourself to the issue. Please the respective branch for development.