OHDSI / MIMIC

MIMIC (Medical Information Mart for Intensive Care) is a large, single-center database comprising information relating to patients admitted to critical care units at a large tertiary care hospital. This repository contains the ETL to the OMOP CDM.
Apache License 2.0
71 stars 23 forks source link

MIMIC IV to OMOP CDM Conversion

(Demo data link: https://doi.org/10.13026/p1f5-7x35)

The project implements an ETL conversion of MIMIC IV PhysioNet dataset to OMOP CDM format.

Concepts / Phylosophy

The ETL is based on the five logic steps.
The ETL process encapsulates the following workflows:

How to run the conversion

To run ETL end-to-end:
To look at UT and Metrics reports:
-- Metrics - row count
SELECT * FROM metrics_dataset.me_total ORDER BY table_name;
-- Metrics - person and visit summary
SELECT
    category, name, count AS row_count
FROM metrics_dataset.me_persons_visits ORDER BY category, name;
-- Metrics - Mapping rates
SELECT
    table_name, concept_field, 
    count   AS rows_mapped, 
    percent AS percent_mapped,
    total   AS rows_total
FROM metrics_dataset.me_mapping_rate 
ORDER BY table_name, concept_field
;
-- Metrics - Mapped and Unmapped source values
SELECT 
    table_name, concept_field, category, source_value, concept_id, concept_name,
    count       AS row_count,
    percent     AS rows_percent
FROM metrics_dataset.me_tops_together 
ORDER BY table_name, concept_field, category, count DESC;
-- UT report
SELECT report_starttime, table_id, test_type, field_name, test_passed
FROM mimiciv_full_metrics_2023_02_17.report_unit_test
order by table_id, report_starttime
-- WHERE NOT test_passed 
;
More option to run ETL parts:

Change Log (latest first)

2023-02-17

2022-09-09

2021-05-17

2021-03-08

2021-02-22

2021-02-08

2021-02-01

2021-01-25

2021-01-13

2020-12-14

2020-12-03

2020-12-01

2020-11-30

2020-11-20

2020-11-13

2020-11-12

2020-11-10

2020-10-27

2020-10-26

2020-10-21

2020-10-21

2020-10-18

2020-10-12

2020-10-01

2020-09-24

2020-09-22

2020-09-18

2020-09-15

2020-09-11

Start development

The End

Concepts / best practice / lessons learned

Concepts / Phylosophy

The ETL is based on the four steps.

Field unit_id is composed during the ETL steps. From right to left: source table name, initial target table name abbreviation, final target table name or abbreviation. For example: unit_id = 'drug.cond.diagnoses_icd' means that the rows in this unit_id belong to Drug_exposure table, inially were prepared for Condition_occurrence table, and its original is source table diagnoses_icd.

How do I get set up?

Contribution guidelines