USM-CHU-FGuyon / BlendedICU

OMOP standardization pipeline for ICU databases
MIT License
23 stars 6 forks source link

Question about config desired features #20

Closed xinyuejohn closed 1 month ago

xinyuejohn commented 4 months ago

Hi, I was wondering if I can config additional measurements that I want to add to final OMOP measurement table?

For example, I want to extract pH and Temperature from MIMIC-III as well. Could you tell me if there's easy way to achieve so?

Thank you!

USM-CHU-FGuyon commented 4 months ago

Hi,

pH and Temperature should already be extracted. Please tell me if you find similar results to mine with this piece of code:

import pandas as pd

from blended_preprocessing.omop_conversion import OMOP_converter

self = OMOP_converter(initialize_tables=True,)

fname = 'measurement_1'

df = pd.read_parquet(f'{self.savedir}/measurement/{fname}.parquet')

df_pH = df.loc[df.measurement_concept_id==self.concept_mapping['pH']]
df_temp = df.loc[df.measurement_concept_id==self.concept_mapping['temperature']]

print(f'{df.visit_occurrence_id.nunique()} visits in {fname}' )

print(f'{df_pH.visit_occurrence_id.nunique()} wisits with pH measurement in {fname}')

print(f'{df_temp.visit_occurrence_id.nunique()} visits with temperature measurement in {fname}')

print('\nNumbers of pH measurements from each source datasets:')
print(self.visit_occurrence.loc[df_pH.visit_occurrence_id.values].visit_source_value.map(lambda x: x.split('-')[0]).value_counts())

print('\nNumbers of temp measurements from each source datasets:')
print(self.visit_occurrence.loc[df_pH.visit_occurrence_id.values].visit_source_value.map(lambda x: x.split('-')[0]).value_counts())

You should get something like;

3795 visits in measurement_1
1749 wisits with pH measurement in measurement_1
1059 visits with temperature measurement in measurement_1

Numbers of pH measurements from each source datasets:
visit_source_value
amsterdam    5619
eicu         3891
mimic        3600
hirid        2048
mimic3       1429
Name: count, dtype: int64

Numbers of temp measurements from each source datasets:
visit_source_value
amsterdam    5619
eicu         3891
mimic        3600
hirid        2048
mimic3       1429
Name: count, dtype: int64
USM-CHU-FGuyon commented 4 months ago

As for adding new timeseries measurements, I should probably make a detailed guide... but here are the important steps:

  1. adding the new entry to auxillary_files/user_input/timeseries_variables.csv. This should specify the name of the variable in one or more datasets and the corresponding OMOP concept_id
  2. For running 1_extract_mimic3.py : Ensure that the variable is kept in the mimic3preparator.py (some variables are dropped to save some space/computation time).
  3. For running 2_mimic3.py: The new variable should already be in the tsp.kept_ts list, you may check that it is the case. If the variable unit has to be harmonized in between datasets, it can be done in the _harmonize_{dataset} functions of database_processing/timeseries_preprocessing.py.
  4. Step 3 should run smoothly.
  5. In Step 4: you should specify the unit's concept id in OMOP_concerter.unit_mapping

Please tell me if you need assistance in doing so.

xinyuejohn commented 4 months ago

Thanks for your answer! I will probably try to add more timeseries measurements using your steps next week.

USM-CHU-FGuyon commented 4 months ago

Great ! If you do, feel free to submit a pull request