SAFEHR-data / ramses-package

R Package for Data-Driven Antimicrobial Stewardship & Surveillance in Hospitals

https://ramses-antibiotics.web.app/

GNU General Public License v3.0

2 stars 0 forks source link

Mapping of OMOP CDM data to RAMSES DB #17

Closed zsenousy closed 2 months ago

zsenousy commented 2 months ago

Edit: @razekmh is modifying the description of this issue to list out the task and our current understanding.

The aim of this issue is to gain a full understanding of how does OMOP data model translate to RAMSES data model.

A more specific target would be to add OMOP data to RAMSES such that we would be able to export a report of the Defined Daily Dose (DDD) per drug? per 1000 bed days for both prescription and administration per ward and per specialisation

Naturally, achieving this task requires us to have access to an accurate data model for each the origin and target standards. We will tackle each separately here:

RAMSES data model is partially described in the article on Objects and classes in Ramses . The article does not explain how import the data into RAMSES. We can see a glimpse of the data expected by RAMSES in the article on Validate and load electronic health records. in the validation article we can see that RAMSES offers some tools to process the EHR data and prepare it to be consumed by RAMSES. Our understanding of what does RAMSES expects will develop as we replicate the steps in the article on OMOP data.
OMOP is a standard by itself which means we can read on the structure in the official website. However, the actual data structure we will work with depends on the implantation in UCLH. This is due to two facts:
- The standard is extendable (which means more tables and/or attributes can be added)
- The standard includes mandatory and optional attributes. This means some of the optional attributes might not be used in the UCLH implementation and thus cannot be used to extracted data.
We try approximate the expected data structure from OMOP using two sources:
- OMOCK
- OMOP Public data on Google big query
Each dataset/tool has its own limitations but we expect that working with them will help us at least partially understand the OMOP mapping to RAMSES while we are waiting for the access to the UCLH OMOP data. Accessing these dataset is described in issue #15

Here we describe the mapping of OMOP Data tables into RAMSES DB. We identify the mapping required between various attributes and the missing data items that need to be derived.

zsenousy commented 2 months ago

@razekmh @AngharadGreen

Initial mapping RAMSES - OMOP

1. Patient Information

OMOP CDM Tables:

PERSON
OBSERVATION_PERIOD
DEATH

RAMSES DB Equivalent:

patients

Mapping Details:

PERSON: The patients table in RAMSES corresponds directly with the OMOP PERSON table. Here, the person_id from OMOP is mapped to patient_id in RAMSES. Additional patient demographics such as gender_concept_id (OMOP) align with sex field in RAMSES.
OBSERVATION_PERIOD: The OMOP OBSERVATION_PERIOD table captures the time span during which the patient is observed (e.g., start and end dates). While RAMSES does not have a direct equivalent table, this can be inferred using the ward_start and ward_end fields in inpatient_ward_movements and/or the admission_date and discharge_date in inpatient_episodes. Any other thoughts?
DEATH: In OMOP, death_date is recorded in the DEATH table. This information is captured in RAMSES under date_of_death in the patients table.

2. Clinical Events

OMOP CDM Tables:

CONDITION_OCCURRENCE
DRUG_EXPOSURE
PROCEDURE_OCCURRENCE
DEVICE_EXPOSURE
OBSERVATION
MEASUREMENT

RAMSES DB Equivalent:

inpatient_diagnoses
drug_prescriptions
drug_therapy_episodes
inpatient_episodes
microbiology_specimens
microbiology_isolates
inpatient_investigations

Mapping Details:

CONDITION_OCCURRENCE: The inpatient_diagnoses table in RAMSES captures inpatient diagnoses and can be mapped to OMOP’s CONDITION_OCCURRENCE table. Here, condition_concept_id from OMOP corresponds to icd_code in RAMSES, with fields like diagnosis_position offering additional context.
DRUG_EXPOSURE: This OMOP table records drug exposure events and corresponds to drug_prescriptions and drug_therapy_episodes in RAMSES. Fields such as drug_concept_id (OMOP) map to ATC_code and prescription_text in RAMSES. Any thoughts?
PROCEDURE_OCCURRENCE: Procedures performed on patients in OMOP can be mapped to inpatient_episodes in RAMSES. While OMOP uses procedure_concept_id, in RAMSES, this information might be inferred using fields related to specific care episodes, such as episode_number and main_specialty_code.
DEVICE_EXPOSURE: OMOP’s DEVICE_EXPOSURE table captures data on devices used on patients. RAMSES does not have a direct equivalent for device data.
OBSERVATION: General observations in OMOP, not covered by other tables, can be mapped to inpatient_investigations in RAMSES, which records various types of observations conducted during inpatient care. Not sure about this, any thoughts?
MEASUREMENT: OMOP’s MEASUREMENT table captures laboratory results and vital signs. In RAMSES, this can be split between:
- microbiology_specimens: Captures the specimen details such as specimen_type_code, corresponding to lab samples.
- microbiology_isolates: Records specific organism-related findings, which is crucial for microbiological data.

3. Microbiology and Laboratory Data

OMOP CDM Tables:

MEASUREMENT
SPECIMEN

RAMSES DB Equivalent:

microbiology_isolates
microbiology_specimens

Mapping Details:

MEASUREMENT: As explained above, this is split between:
- microbiology_specimens: To capture details about the specimen collected.
- microbiology_isolates: To record specific lab results related to microbiology, such as organism codes and susceptibility results.
SPECIMEN: This table in OMOP directly corresponds to microbiology_specimens in RAMSES, mapping fields like specimen_concept_id to specimen_type_code.

4. Healthcare Encounters

OMOP CDM Tables:

VISIT_OCCURRENCE
CARE_SITE
PROVIDER

RAMSES DB Equivalent:

inpatient_ward_movements

Mapping Details:

VISIT_OCCURRENCE: This OMOP table captures each healthcare encounter, and it can be mapped to inpatient_ward_movements in RAMSES. Fields like visit_start_date and visit_end_date in OMOP align with ward_start and ward_end in RAMSES.
CARE_SITE and PROVIDER: RAMSES does not seem to have direct equivalents for these tables.

5. Missing or Unmapped Items

Gaps in RAMSES Database:

DEVICE_EXPOSURE: No direct equivalent table for device data, indicating a missing component.
OBSERVATION_PERIOD: Needs to be inferred as there is no direct mapping.
CARE_SITE and PROVIDER: No tables capturing this specific information. I believe this can be ignored but we need confirmation from medical professionals (Use cases), if this required.

AngharadGreen commented 2 months ago

Thank you for this @zsenousy. I'm creating a visualisation of the OMOP and RAMSES mappings as well

razekmh commented 2 months ago

I am using omock to match the OMOP data to Ramses. My target in the mapping is to match the structure of the dummy data Ramses is using in the validate article. I am starting the matching with the two table drug_prescriptions and drug_administrations. There are more tables in the dummy data but I aiming for these two in the first instance.

> str(drug_prescriptions)
'data.frame':   367 obs. of  12 variables:
 $ patient_id        : chr  "5124578766" "4874231672" "6292626973" "6292626973" ...
 $ prescription_id   : chr  "66cac1c5eab88d72c8b7687966357f5b" "8ccd67f4730b62ceafb8bcb27996c10c" "72cf4b592b0f4e2143b4bb9d7c569c97" "806c86b55cf50505a20b722f081c4075" ...
 $ rxsummary         : chr  "Piperacillin / Tazobactam IVI 4.5 g TDS" "Ciprofloxacin ORAL 500 mg BD" "Flucloxacillin ORAL 500 mg 6H" "Metronidazole ORAL 400 mg TDS" ...
 $ authoring_date    : POSIXct, format: "2015-08-04 13:07:16" "2017-07-06 09:49:31" "2017-01-13 17:34:41" "2017-11-12 09:09:13" ...
 $ prescription_start: POSIXct, format: "2015-08-04 14:45:16" "2017-07-06 10:26:31" "2017-01-13 18:48:41" "2017-11-12 09:44:13" ...
 $ prescription_end  : POSIXct, format: "2015-08-07 14:45:16" "2017-07-07 22:26:31" "2017-01-17 18:48:41" "2017-11-14 09:44:13" ...
 $ tr_DESC           : chr  "Piperacillin / Tazobactam" "Ciprofloxacin" "Flucloxacillin" "Metronidazole" ...
 $ route             : chr  "IV" "ORAL" "ORAL" "ORAL" ...
 $ dose              : num  4.5 500 500 400 4.5 500 4.5 4.5 600 2 ...
 $ units             : chr  "g" "mg" "mg" "mg" ...
 $ frequency         : chr  "TDS" "BD" "6H" "TDS" ...
 $ duration          : num  3 1.5 4 2 4 3 3 2 1 2 ...

> str(drug_administrations)
'data.frame':   2818 obs. of  7 variables:
 $ patient_id         : chr  "5124578766" "5124578766" "5124578766" "5124578766" ...
 $ prescription_id    : chr  "66cac1c5eab88d72c8b7687966357f5b" "66cac1c5eab88d72c8b7687966357f5b" "66cac1c5eab88d72c8b7687966357f5b" "66cac1c5eab88d72c8b7687966357f5b" ...
 $ tr_DESC            : chr  "Piperacillin / Tazobactam" "Piperacillin / Tazobactam" "Piperacillin / Tazobactam" "Piperacillin / Tazobactam" ...
 $ route              : chr  "IV" "IV" "IV" "IV" ...
 $ dose               : num  4.5 4.5 4.5 4.5 4.5 4.5 4.5 4.5 4.5 500 ...
 $ units              : chr  "g" "g" "g" "g" ...
 $ administration_date: POSIXct, format: "2015-08-04 14:45:16" "2015-08-04 23:45:16" "2015-08-05 08:45:16" "2015-08-05 17:45:16" ...

razekmh commented 2 months ago

I am creating a dummy dataset using omock by passing in the following code. The variables are set to default examples from the omock code and comments. I am also creating a ramses dataset to match against

library(Ramses)
library(omock)
library(dplyr)

ramses_db <- connect_local_database("ramses-db.duckdb")

cdm <- mockCdmReference() |>
    mockPerson() |>
    mockObservationPeriod() |>
    mockConditionOccurrence(recordPerson = 10) |>
    mockDrugExposure(recordPerson = 40) |>
    mockMeasurement(recordPerson = 5) |>
    mockDeath(recordPerson = 1)

Tackling the drug_prescription table starts from the drug_exposure table from omock.

> cdm$drug_exposure
# A tibble: 180 × 6
   drug_concept_id person_id drug_exposure_start_date drug_exposure_end_date drug_exposure_id drug_type_concept_id
 *           <dbl>     <int> <date>                   <date>                            <int>                <dbl>
 1              10         9 1991-07-23               1995-03-08                            1                    1
 2              10         4 1994-10-23               2003-10-24                            2                    1
 3              10         7 2014-03-15               2014-03-22                            3                    1
 4              10         1 2002-04-13               2004-12-14                            4                    1
 5              10         2 1999-12-08               2001-02-27                            5                    1
 6              10         7 2014-03-13               2014-03-26                            6                    1
 7              10         2 1999-04-22               2002-11-26                            7                    1
 8              10         3 2015-03-01               2015-05-28                            8                    1
 9              10         1 2002-08-13               2005-10-05                            9                    1
10              10         5 2012-05-16               2012-09-02                           10                    1
# ℹ 170 more rows
# ℹ Use `print(n = ...)` to see more rows

We can see that drug_concept_id is the attribute is present which is representing

A foreign key to the predefined Concept identifier in the Standardized Vocabularies reflecting the type of Drug Exposure recorded. It indicates how the Drug Exposure was represented in the source data.

However the attributes effective_drug_dose and dose_unit_concept_id are missing. These two attributes are not required in the OMOP standard so it is acceptable they are missing in the mock data.

razekmh commented 2 months ago

Tracing the drug_concept_id to its primary table leads to the concept table.

> cdm$concept
# A tibble: 3,245 × 10
   concept_id concept_name  domain_id vocabulary_id standard_concept concept_class_id concept_code valid_start_date
 *      <dbl> <chr>         <chr>     <chr>         <chr>            <chr>            <chr>        <chr>           
 1          1 Musculoskele… Condition SNOMED        S                Clinical Finding 1234         NA              
 2          2 Osteoarthros… Condition SNOMED        S                Clinical Finding 1234         NA              
 3          3 Arthritis     Condition SNOMED        S                Clinical Finding 1234         NA              
 4          4 Osteoarthrit… Condition SNOMED        S                Clinical Finding 1234         NA              
 5          5 Osteoarthrit… Condition SNOMED        S                Clinical Finding 1234         NA              
 6          6 Osteonecrosis Condition SNOMED        S                Clinical Finding 1234         NA              
 7          7 Degenerative… Condition Read          NA               Diagnosis        1234         NA              
 8          8 Knee osteoar… Condition Read          NA               Diagnosis        1234         NA              
 9          9 H/O osteoart… Observat… LOINC         S                Observation      1234         NA              
10         10 Adalimumab    Drug      RxNorm        S                Ingredient       1234         NA              
# ℹ 3,235 more rows
# ℹ 2 more variables: valid_end_date <chr>, invalid_reason <chr>
# ℹ Use `print(n = ...)` to see more rows

The concept table contains more than just the drugs reference. We can inspect the list of the concept domain id

> unique(cdm$concept$domain_id)
[1] "Condition"   "Observation" "Drug"        "Ethnicity"   "Gender"      "Race"        "Unit"        "Visit"      
[9] "Measurement"

We can filter the concept table to show only the drug concepts

> cdm$concept |> filter(domain_id == 'Drug')
# A tibble: 6 × 10
  concept_id concept_name   domain_id vocabulary_id standard_concept concept_class_id concept_code valid_start_date
       <dbl> <chr>          <chr>     <chr>         <chr>            <chr>            <chr>        <chr>           
1         10 Adalimumab     Drug      RxNorm        S                Ingredient       1234         NA              
2         11 Injection      Drug      OMOP          NA               Dose Form        1234         NA              
3         12 ALIMENTARY TR… Drug      ATC           NA               ATC 1st          1234         NA              
4         13 Descendant dr… Drug      RxNorm        S                Drug             1234         NA              
5         14 Injectable     Drug      OMOP          NA               Dose Form        1234         NA              
6         19 Other ingredi… Drug      RxNorm        S                Ingredient       1234         NA              
# ℹ 2 more variables: valid_end_date <chr>, invalid_reason <chr>

razekmh commented 2 months ago

Let's the save this table to prepare it for more processing

> drug_concepts <- cdm$concept |> filter(domain_id == 'Drug')

One of the steps of the validation process is to match the drug codes using the AMR package. As follows.

# attempting to map drug name using AMR package
drug_prescriptions$drug_code <- AMR::as.ab(drug_prescriptions$tr_DESC)

We can try the same with the list of drugs that were found in the concept table

> drug_concepts$drug_code <- drug_concepts$concept_name |> AMR::as.ab()
Warning message:
in as.ab(): these values could not be coerced to a valid antimicrobial ID: "Adalimumab", "Injectable",
"Injection", and "Other ingredient".

We get a partial match as seen in the table

> drug_concepts |> select(concept_id,concept_name, drug_code)
# A tibble: 6 × 3
  concept_id concept_name                    drug_code
       <dbl> <chr>                           <ab>     
1         10 Adalimumab                      NA       
2         11 Injection                       NA       
3         12 ALIMENTARY TRACT AND METABOLISM PPA      
4         13 Descendant drug                 SLF9     
5         14 Injectable                      NA       
6         19 Other ingredient                NA

Maybe manual editing is required for this step? as in the example from validate article

# editing drug names
drug_prescriptions$drug_code <- gsub("Vancomycin protocol", 
                                   "Vancomycin",
                                   drug_prescriptions$tr_DESC)
# mapping drug name using AMR package
drug_prescriptions$drug_code <- AMR::as.ab(drug_prescriptions$drug_code)
drug_prescriptions$drug_name <- AMR::ab_name(drug_prescriptions$drug_code)

I suspect the actual data will have more drugs to match.

razekmh commented 2 months ago

Going back to the drug_exposure table, we could now extract the rows which matches the concept_id from our newly created drug_concepts table.

> cdm$drug_exposure |> filter(drug_concept_id %in% drug_concepts$concept_id)
# A tibble: 2,400 × 6
   drug_concept_id person_id drug_exposure_start_date drug_exposure_end_date drug_exposure_id drug_type_concept_id
             <dbl>     <int> <date>                   <date>                            <int>                <dbl>
 1              10         9 1987-03-04               2004-10-31                            1                    1
 2              10         4 1997-04-16               2005-03-29                            2                    1
 3              10         7 2014-03-09               2014-03-24                            3                    1
 4              10         1 2004-05-16               2008-08-06                            4                    1
 5              10         2 1999-11-17               2001-07-19                            5                    1
 6              10         7 2014-03-13               2014-03-22                            6                    1
 7              10         2 1999-10-06               2001-08-11                            7                    1
 8              10         3 2015-01-26               2015-05-26                            8                    1
 9              10         1 2002-04-04               2007-06-28                            9                    1
10              10         5 2012-05-13               2012-06-28                           10                    1
# ℹ 2,390 more rows
# ℹ Use `print(n = ...)` to see more rows

2,400 are all the rows of drug_exposure

> cdm$drug_exposure
# A tibble: 2,400 × 6

Inspecting the distribution of the drug codes in the drug_exposure we find uniform distribution

> cdm$drug_exposure |> 
+     filter(drug_concept_id %in% drug_concepts$concept_id) |>
+     group_by(drug_concept_id) |>
+     summarise(count = n())
# A tibble: 6 × 2
  drug_concept_id count
            <dbl> <int>
1              10   400
2              11   400
3              12   400
4              13   400
5              14   400
6              19   400

razekmh commented 2 months ago

@AngharadGreen @zsenousy

I believe the next steps are to explore the data in one of the example datasets and try to use it to replicate the steps in the article on Validate and load electronic health records. Some patterns and problems are expected to arise from this process. Since we do not have access to the actual data yet, we are not aiming for a full solution yet. However a general understanding of the feasibility of matching the two data models is what we aim for now.

As you might see above I started working with two data frames provided by RAMSES as examples for what the EHR data would look like. I think it is worth it to try to match the data structure of both data frames from OMOP data from either example dataset we have access to.

Working with omock so far has shown that we could get some insights into how the data is organised in OMOP. For example we know that the drug names are standardised using the concept table. Also we can see that the vocabulary_id is either RxNorm, ATC, or OMOP which seems to follow different standards by itself. Maybe we could dig a little bit into this, keeping in mind that the hospital might be using one specific standard or a mix (we do not know, but we could try and ask)

> cdm$concept |> filter(domain_id == 'Drug')
# A tibble: 6 × 10
  concept_id concept_name   domain_id vocabulary_id standard_concept concept_class_id concept_code valid_start_date
       <dbl> <chr>          <chr>     <chr>         <chr>            <chr>            <chr>        <chr>           
1         10 Adalimumab     Drug      RxNorm        S                Ingredient       1234         NA              
2         11 Injection      Drug      OMOP          NA               Dose Form        1234         NA              
3         12 ALIMENTARY TR… Drug      ATC           NA               ATC 1st          1234         NA              
4         13 Descendant dr… Drug      RxNorm        S                Drug             1234         NA              
5         14 Injectable     Drug      OMOP          NA               Dose Form        1234         NA              
6         19 Other ingredi… Drug      RxNorm        S                Ingredient       1234         NA              
# ℹ 2 more variables: valid_end_date <chr>, invalid_reason <chr>

Also we can see that the step of matching the drug names using the AMR package, we got only a partial match

> drug_concepts |> select(concept_id,concept_name, drug_code)
# A tibble: 6 × 3
  concept_id concept_name                    drug_code
       <dbl> <chr>                           <ab>     
1         10 Adalimumab                      NA       
2         11 Injection                       NA       
3         12 ALIMENTARY TRACT AND METABOLISM PPA      
4         13 Descendant drug                 SLF9     
5         14 Injectable                      NA       
6         19 Other ingredient                NA

I am not sure if this step will be absolutely necessary but based on the note in the article is seems to be essential to extract the DDD.

Using the AMR package, it is possible to infer important properties of antibacterial/antifungal drugs:

ATC classes and groups, which can be useful for instance in order to identify antifungals (group ‘Antimycotics for systemic use’)

Defined Daily Doses (DDDs), which are commonly used to measure drug consumption.

razekmh commented 2 months ago

I would suggest we try the same steps with the data from the big data open dataset. The concept table has 3,902,588 entries which holds much more potential than the omock dataset. @AngharadGreen Please refer to #15 on accessing the data. Please comment here if you face any issues with accessing the data

AngharadGreen commented 2 months ago

@razekmh @AngharadGreen

Initial mapping RAMSES - OMOP

1. Patient Information

OMOP CDM Tables:

PERSON

OBSERVATION_PERIOD

DEATH

RAMSES DB Equivalent:

patients

Mapping Details:

PERSON: The patients table in RAMSES corresponds directly with the OMOP PERSON table. Here, the person_id from OMOP is mapped to patient_id in RAMSES. Additional patient demographics such as gender_concept_id (OMOP) align with sex field in RAMSES.

OBSERVATION_PERIOD: The OMOP OBSERVATION_PERIOD table captures the time span during which the patient is observed (e.g., start and end dates). While RAMSES does not have a direct equivalent table, this can be inferred using the ward_start and ward_end fields in inpatient_ward_movements and/or the admission_date and discharge_date in inpatient_episodes. Any other thoughts?

DEATH: In OMOP, death_date is recorded in the DEATH table. This information is captured in RAMSES under date_of_death in the patients table.

2. Clinical Events

OMOP CDM Tables:

CONDITION_OCCURRENCE

DRUG_EXPOSURE

PROCEDURE_OCCURRENCE

DEVICE_EXPOSURE

OBSERVATION

MEASUREMENT

RAMSES DB Equivalent:

inpatient_diagnoses

drug_prescriptions

drug_therapy_episodes

inpatient_episodes

microbiology_specimens

microbiology_isolates

inpatient_investigations

Mapping Details:

CONDITION_OCCURRENCE: The inpatient_diagnoses table in RAMSES captures inpatient diagnoses and can be mapped to OMOP’s CONDITION_OCCURRENCE table. Here, condition_concept_id from OMOP corresponds to icd_code in RAMSES, with fields like diagnosis_position offering additional context.

DRUG_EXPOSURE: This OMOP table records drug exposure events and corresponds to drug_prescriptions and drug_therapy_episodes in RAMSES. Fields such as drug_concept_id (OMOP) map to ATC_code and prescription_text in RAMSES. Any thoughts?

PROCEDURE_OCCURRENCE: Procedures performed on patients in OMOP can be mapped to inpatient_episodes in RAMSES. While OMOP uses procedure_concept_id, in RAMSES, this information might be inferred using fields related to specific care episodes, such as episode_number and main_specialty_code.

DEVICE_EXPOSURE: OMOP’s DEVICE_EXPOSURE table captures data on devices used on patients. RAMSES does not have a direct equivalent for device data.

OBSERVATION: General observations in OMOP, not covered by other tables, can be mapped to inpatient_investigations in RAMSES, which records various types of observations conducted during inpatient care. Not sure about this, any thoughts?

MEASUREMENT: OMOP’s MEASUREMENT table captures laboratory results and vital signs. In RAMSES, this can be split between:

microbiology_specimens: Captures the specimen details such as specimen_type_code, corresponding to lab samples.

microbiology_isolates: Records specific organism-related findings, which is crucial for microbiological data.

3. Microbiology and Laboratory Data

OMOP CDM Tables:

MEASUREMENT

SPECIMEN

RAMSES DB Equivalent:

microbiology_isolates

microbiology_specimens

Mapping Details:

MEASUREMENT: As explained above, this is split between:

microbiology_specimens: To capture details about the specimen collected.

microbiology_isolates: To record specific lab results related to microbiology, such as organism codes and susceptibility results.

SPECIMEN: This table in OMOP directly corresponds to microbiology_specimens in RAMSES, mapping fields like specimen_concept_id to specimen_type_code.

4. Healthcare Encounters

OMOP CDM Tables:

VISIT_OCCURRENCE

CARE_SITE

PROVIDER

RAMSES DB Equivalent:

inpatient_ward_movements

Mapping Details:

VISIT_OCCURRENCE: This OMOP table captures each healthcare encounter, and it can be mapped to inpatient_ward_movements in RAMSES. Fields like visit_start_date and visit_end_date in OMOP align with ward_start and ward_end in RAMSES.

CARE_SITE and PROVIDER: RAMSES does not seem to have direct equivalents for these tables.

5. Missing or Unmapped Items

Gaps in RAMSES Database:

DEVICE_EXPOSURE: No direct equivalent table for device data, indicating a missing component.

OBSERVATION_PERIOD: Needs to be inferred as there is no direct mapping.

CARE_SITE and PROVIDER: No tables capturing this specific information. I believe this can be ignored but we need confirmation from medical professionals (Use cases), if this required.

I have put together this visualisation to map the OMOP tables to Ramses tables: OMOK vs RAMSES.drawio.pdf

AngharadGreen commented 2 months ago

I have found this website is very useful for understanding the OMOP CDM - https://ohdsi.github.io/CommonDataModel/cdm54.html

AngharadGreen commented 2 months ago

I would suggest we try the same steps with the data from the big data open dataset. The concept table has 3,902,588 entries which holds much more potential than the omock dataset. @AngharadGreen Please refer to #15 on accessing the data. Please comment here if you face any issues with accessing the data

@razekmh I have struggled trying to access the data from BigQuery public datasets but have found this https://github.com/OHDSI/ETL-CMS.git and I am working through the instructions to download the dataset as it's the same one on the BigQuery public datasets

zsenousy commented 2 months ago

@razekmh @AngharadGreen

Mapping RAMSES fields to OMOP DRUG_EXPOSURE fields

Load the necessary libraries

library(Ramses)
library(omock)
library(dplyr)
library(AMR)  # For drug name mapping

Connect to local database for Ramses

ramses_db <- connect_local_database("ramses-db.duckdb")

Generate mock OMOP CDM data

cdm <- mockCdmReference() |>
  mockPerson() |>
  mockObservationPeriod() |>
  mockConditionOccurrence(recordPerson = 10) |>
  mockDrugExposure(recordPerson = 40) |>
  mockMeasurement(recordPerson = 5) |>
  mockDeath(recordPerson = 1)

Simulate RAMSES `drug_prescriptions` table

drug_prescriptions <- data.frame(
  patient_id = c("5124578766", "4874231672", "6292626973", "6292626973"),
  prescription_id = c("66cac1c5eab88d72c8b7687966357f5b", "8ccd67f4730b62ceafb8bcb27996c10c", "72cf4b592b0f4e2143b4bb9d7c569c97", "806c86b55cf50505a20b722f081c4075"),
  rxsummary = c("Piperacillin / Tazobactam IVI 4.5 g TDS", "Ciprofloxacin ORAL 500 mg BD", "Flucloxacillin ORAL 500 mg 6H", "Metronidazole ORAL 400 mg TDS"),
  authoring_date = as.POSIXct(c("2015-08-04 13:07:16", "2017-07-06 09:49:31", "2017-01-13 17:34:41", "2017-11-12 09:09:13")),
  prescription_start = as.POSIXct(c("2015-08-04 14:45:16", "2017-07-06 10:26:31", "2017-01-13 18:48:41", "2017-11-12 09:44:13")),
  prescription_end = as.POSIXct(c("2015-08-07 14:45:16", "2017-07-07 22:26:31", "2017-01-17 18:48:41", "2017-11-14 09:44:13")),
  tr_DESC = c("Piperacillin / Tazobactam", "Ciprofloxacin", "Flucloxacillin", "Metronidazole"),
  route = c("IV", "ORAL", "ORAL", "ORAL"),
  dose = c(4.5, 500, 500, 400),
  units = c("g", "mg", "mg", "mg"),
  frequency = c("TDS", "BD", "6H", "TDS"),
  duration = c(3, 1.5, 4, 2)
)

Map RAMSES fields to OMOP DRUG_EXPOSURE fields

omop_drug_exposure <- drug_prescriptions %>%
  mutate(
    # Mapping patient_id to OMOP's person_id
    person_id = patient_id,

    # Prescription ID in RAMSES maps to drug_exposure_id in OMOP
    drug_exposure_id = prescription_id,

    # Authoring date and prescription start/end in RAMSES map to drug_exposure_start/end in OMOP
    drug_exposure_start_date = prescription_start,
    drug_exposure_end_date = prescription_end,

    # Use AMR package to map drug names (tr_DESC) to OMOP's drug_concept_id
    drug_code = AMR::as.ab(tr_DESC),

    # Use route mapping (IV, ORAL, etc.) to route_concept_id
    route_concept_id = case_when(
      route == "IV" ~ 4132165,  # Concept ID for intravenous route (example)
      route == "ORAL" ~ 4122237,  # Concept ID for oral route (example)
      TRUE ~ NA_real_  # If unknown, NA
    ),

    # Dose and units map to dose_value and dose_unit_concept_id in OMOP
    dose_value = dose,
    dose_unit_concept_id = case_when(
      units == "g" ~ 8576,  # Concept ID for grams (example)
      units == "mg" ~ 8577,  # Concept ID for milligrams (example)
      TRUE ~ NA_real_
    ),

    # Map frequency (TDS, BD, etc.) to free-text sig in OMOP, or standard mapping if available
    sig = paste0(dose, units, " ", frequency),

    # Calculate the duration of the drug exposure
    duration_days = as.numeric(difftime(prescription_end, prescription_start, units = "days"))
  )

View the final omop_drug_exposure table

print(omop_drug_exposure)

Validate drug mappings using AMR and Ramses functions

validate_drug_mapping <- function(df) {
  if (all(!is.na(df$drug_code))) {
    message("All drugs successfully mapped to AMR codes!")
  } else {
    message("Some drug mappings failed. Please check the following:")
    print(df %>% filter(is.na(drug_code)))
  }
}

Validate the mappings

validate_drug_mapping(omop_drug_exposure)

Further validation before loading into the database

validate_prescriptions(omop_drug_exposure)

Load the data into the Ramses database or OMOP CDM

dbWriteTable(ramses_db, "drug_exposure", omop_drug_exposure, overwrite = TRUE)

Verify the data was loaded correctly

loaded_data <- dbReadTable(ramses_db, "drug_exposure")
print(loaded_data)

Close the database connection

dbDisconnect(ramses_db)

razekmh commented 2 months ago

I think this issue has served its purpose. We will have to split it to multiple issues, one per table from the validate article

SAFEHR-data / ramses-package

Mapping of OMOP CDM data to RAMSES DB #17

Initial mapping RAMSES - OMOP

1. Patient Information

OMOP CDM Tables:

RAMSES DB Equivalent:

2. Clinical Events

OMOP CDM Tables:

RAMSES DB Equivalent:

3. Microbiology and Laboratory Data

OMOP CDM Tables:

RAMSES DB Equivalent:

4. Healthcare Encounters

OMOP CDM Tables:

RAMSES DB Equivalent:

5. Missing or Unmapped Items

Initial mapping RAMSES - OMOP

1. Patient Information

OMOP CDM Tables:

RAMSES DB Equivalent:

2. Clinical Events

OMOP CDM Tables:

RAMSES DB Equivalent:

3. Microbiology and Laboratory Data

OMOP CDM Tables:

RAMSES DB Equivalent:

4. Healthcare Encounters

OMOP CDM Tables:

RAMSES DB Equivalent:

5. Missing or Unmapped Items

Mapping RAMSES fields to OMOP DRUG_EXPOSURE fields

Load the necessary libraries

Connect to local database for Ramses

Generate mock OMOP CDM data

Simulate RAMSES drug_prescriptions table

Map RAMSES fields to OMOP DRUG_EXPOSURE fields

View the final omop_drug_exposure table

Validate drug mappings using AMR and Ramses functions

Validate the mappings

Further validation before loading into the database

Load the data into the Ramses database or OMOP CDM

Verify the data was loaded correctly

Close the database connection

Simulate RAMSES `drug_prescriptions` table