SAFEHR-data / ramses-package

R Package for Data-Driven Antimicrobial Stewardship & Surveillance in Hospitals
https://ramses-antibiotics.web.app/
GNU General Public License v3.0
2 stars 0 forks source link

Map OMOP data to `drug_prescriptions` table #25

Closed razekmh closed 1 month ago

razekmh commented 1 month ago

Extract data from the example OMOP data to fill the drug_prescriptions table from the validate article. This is a split from #17.

Please feel free to assign yourself to the issue. Please the respective branch for development.

zsenousy commented 1 month ago

R Script has been developed that maps drug_exposure to RAMSES drug_prescriptions. Synthetic Patient Data in OMOP has been used as OMOP public dataset.

The following steps provide instructions on accessing chunks from Drug_exposure and Concept tables required for the mapping process into RAMSES drug_prescriptions:

1- Load required libraries:

library(bigrquery)
library(DBI)
library(gargle)

2- Authenticate connection to BigQuery

sa_key_path <- ("path/to/your/key_file")

bq_auth(path = sa_key_path)

project_id <- “your_project_id”

con <- dbConnect(bigquery(), project = project_id, dataset = “bigquery-public-data”)

3- SQL queries for retrieving data from drug_exposure and concept tables:

sql_string1 <- “SELECT * FROM bigquery-public-data.cms_synthetic_patient_data_omop.drug_exposure LIMIT 10000”
sql_string2 <- "SELECT * FROM bigquery-public-data.cms_synthetic_patient_data_omop.concept WHERE domain_id IN ('Drug', 'Route', 'Unit')"
result1 <- dbGetQuery(con, sql_string1)
result2 <- dbGetQuery(con, sql_string2)

4- Save data

write.table(result1, file =  "path/to/your/drug_exposure_file.csv")
write.table(result2, file = "path/to/your/concept_file.csv")

Note: Concept table has been filtered on domain_id = {DRUG, ROUTE, and UNIT}.

zsenousy commented 1 month ago

The developed script performs the mapping and transformation of drug exposure data from the OMOP format into a RAMSES-compatible format. Let's go through the code block by block:


1. Loading Libraries

# Load necessary libraries
library(dplyr)
library(readr)
library(AMR)  # For drug name mapping

2. Loading Data

# Load data from CSV files
drug_exposure <- read_csv("/path/to/cleaned_drug_exposure.csv")
concept <- read_csv("path/to/cleaned_concept.csv")

3. Print Column Names and Data Preview

# Print the column names to check if they are correct
print(colnames(concept))
# Print the cleaned data to verify everything looks correct
print(head(concept))

# Ensure the relevant columns are available in drug_exposure
print(colnames(drug_exposure))
print(head(drug_exposure))  

4. Joining Drug Data and Concept Table

#Mapping using left join
drug_exposure <- drug_exposure %>%
  left_join(concept, by = c("drug_concept_id" = "concept_id")) %>%
  rename(drug_name = concept_name) %>%  # Rename concept_name to drug_name
  mutate(route = NA)  # Since route_concept_id is NA.

5. Removing Unnecessary Columns

# Remove unnecessary columns from concept table
drug_exposure <- drug_exposure %>%
  select(-domain_id, -vocabulary_id, -concept_class_id, -standard_concept, 
         -concept_code, -valid_start_date, -valid_end_date, -invalid_reason)

6. Mapping dose_unit_source_value as Units

# Map 'dose_unit_source_value' directly as units (since 'dose_unit_concept_id' is missing)
if ("dose_unit_source_value" %in% colnames(drug_exposure)) {
  drug_exposure <- drug_exposure %>%
    mutate(units = dose_unit_source_value)  # Map 'dose_unit_source_value' to units
} else {
  drug_exposure$units <- NA  # If 'dose_unit_source_value' is missing, set units as NA
}

7. Mapping OMOP Fields to RAMSES Format

# Map drug_exposure fields to RAMSES fields
omop_to_ramses <- drug_exposure %>%
  transmute(
    # Mapping OMOP person_id to RAMSES patient_id
    patient_id = person_id,

    # Mapping OMOP drug_exposure_id to RAMSES prescription_id
    prescription_id = drug_exposure_id,

    # Start and end dates of drug exposure
    prescription_start = drug_exposure_start_date,
    prescription_end = drug_exposure_end_date,

    # Mapping drug_concept_id to RAMSES tr_DESC (drug description) using AMR package
    tr_DESC = ifelse(!is.na(AMR::ab_name(drug_name)), AMR::ab_name(drug_name), "Unknown drug"),

    # Route (e.g., IV, Oral) from concept table
    route = route,

    # Using 'quantity' as a proxy for dose if 'dose_value' is not available
    dose = quantity,

    # Units mapped from 'dose_unit_source_value'
    units = units,

    # Calculate duration between start and end dates in days
    duration_days = as.numeric(difftime(prescription_end, prescription_start, units = "days"))
  )

8. Displaying Final Data

# Display the final mapped data from OMOP to RAMSES
print(omop_to_ramses)

9. Validation Function

# Validation function for checking mappings
validate_mapping <- function(df) {
  if (all(!is.na(df$tr_DESC))) {
    message("All drugs successfully mapped to RAMSES fields!")
  } else {
    message("Some drug mappings failed. Please check the following:")
    print(df %>% filter(is.na(tr_DESC)))
  }
}

# Run the validation function
validate_mapping(omop_to_ramses)

10. Saving the Final Data

# save the final mapped data to a CSV file
write_csv(omop_to_ramses, "./path/to/mapped_drug_prescriptions.csv")

Summary

The code starts by loading necessary libraries and data, verifying the structure of the data, and then transforming it from OMOP format to RAMSES format using joins, renaming, and custom mappings. It also includes validation to ensure successful mapping, and finally, it saves the transformed data.

The key aspects of the transformation involve:

  1. Mapping drug concept IDs to drug names using the concept table.
  2. Mapping dose units and calculating drug exposure duration.
  3. Converting drug names to standard descriptions using the AMR package.
  4. Exporting the transformed data for use in RAMSES.
zsenousy commented 1 month ago

Issues

During the process of mapping OMOP drug exposure data to RAMSES, several issues were encountered that led to incomplete or missing mappings. Notably, some drug standards were not mapped correctly, resulting in entries being labelled as "Unknown drug" in the final dataset. This was primarily due to:

razekmh commented 1 month ago

Well done @zsenousy. This is great work. Would it be okay to push your code to the branch and resolve this issue. I think #27 could use a lot of the functions you built for this.

zsenousy commented 1 month ago

Pull request has been created for this code addition.