dancoster / DrugLab

Repository for the drug<>lab pair
1 stars 0 forks source link

Create Dataset with longitudinal observation from HiRiD #36

Open dancoster opened 1 year ago

dancoster commented 1 year ago

Take mean of hourly measures.

lab measurements

vital_signs = ['Heart Rate', 'Respiratory rate','Oxygen saturation', 'Systolic blood pressure', 'Diastolic blood pressure', 'Temperature'] labs_bmp = ['Glucose','Potassium','Sodium','Chloride', 'Creatinine', 'Blood urea nitrogen', 'Bicarbonate', 'Calcium', 'Albumin', 'Lactate dehydrogenase','Magnesium','Lactic acid'] labs_cbc = ['Hematocrit','Hemoglobin', 'Platelets', 'White blood cell count', 'Red blood cell count', 'Mean corpuscular volume', 'Lymphocytes', 'Neutrophils'] labs_cauglation = ['Prothrombin time INR']

PavanReddy28 commented 1 year ago

Generated HIRID_EXTRACT with most of the above lab tests as columns for (10,20) part.

No death time available in HIRID.

dancoster commented 1 year ago
PavanReddy28 commented 1 year ago

Shared the merged file - HIRID Extract

dancoster commented 1 year ago

Hi, I reviewed the file. Looks good, few comments:

  1. Please merge DATE and Hour to pd.Datetime (see 2115-07-27 19:00:00+00:00).
  2. There is too little number of Albumin measurements - please double check.
  3. Please remove all cols of Body fluids / Urine:
  1. Please create a function that maps the units to the units of MIMIC (I shared with you the data file of mimic via gmail):

    • Glucose, Hemoglobin, etc. (check all the units).
    • Than please compare (deinsity plot + p value of t-test) per each clinical measure.
  2. Please compare (deinsity plot + p value of t-test) and merge Arterial and Venous blood to one col:

    • Hemoglobin [Mass/volume] in Blood & Hemoglobin [Mass/volume] in Arterial blood
    • Lactate [Moles/volume] in Venous blood & Lactate [Mass/volume] in Arterial blood
  3. Please compare (deinsity plot + p value of t-test) and merge: Invasive diastolic arterial pressure & Non-invasive diastolic arterial pressure Invasive systolic arterial pressure & Non-invasive systolic arterial pressure

  4. Why did you include the col: 'Metronidazole tabl 200 mg'? It seems like a drug

PavanReddy28 commented 1 year ago

Send the density plots for the following:

PavanReddy28 commented 1 year ago

Hemoglobin [Mass/volume] in Blood & Hemoglobin [Mass/volume] in Arterial blood

TtestResult(statistic=120.05747587932802, pvalue=0.0, df=39808) Image

Lactate [Moles/volume] in Venous blood & Lactate [Mass/volume] in Arterial blood

TtestResult(statistic=8.601894346207528, pvalue=1.77624076255818e-17, df=1670) Image

Invasive diastolic arterial pressure & Non-invasive diastolic arterial pressure

TtestResult(statistic=6.114071547372929, pvalue=9.785966360147038e-10, df=48505) Image

Invasive systolic arterial pressure & Non-invasive systolic arterial pressure

TtestResult(statistic=31.800783697879396, pvalue=1.1542069839654229e-219, df=48513) Image

PavanReddy28 commented 1 year ago

Final HiRiD Longitudinal dataset - https://drive.google.com/file/d/1Txe2vKBsFkoeuQdsWPUwq1NLy4S1HUbL/view?usp=sharing

dancoster commented 1 year ago
  1. Please include only the features that are in MIMIC Longitudinal data (for example remove 'Pulmonary artery diastolic pressure').
  2. Please create a dictionary that mapping between the features names and MIMIC Longitudinal data and HiRID Longitudinal data. Than change the cols names to the same name as MIMIC. I wish to be able to run the current code on this file. Than plot density plot per each feature and compare its values in MIMIC and HiRID (with P-value). If the dist. are different (visually and statistically) create a function that maps HiRID units to the units of MIMIC.
  3. Please include only subject that were hospitalized for at least 48h.
  4. There is too little number of Albumin measurements - please double check.
PavanReddy28 commented 1 year ago

Comparing MIMIC and HIRID Longitudinal data features

Need to filter out some columns from HIRID Longitudinal data based on the below comparision table.

Image

Merged MIMIC and HIRID.csv

Note: Used the following dictionary to map hirid and mimic features.

hirid_mapping = {
    'Alanine aminotransferase [Enzymatic activity/volume] in Serum or Plasma' : None,
    'Albumin [Mass/volume] in Serum or Plasma' : 'Albumin',
    'Amylase [Enzymatic activity/volume] in Serum or Plasma': None,
    'Aspartate aminotransferase [Enzymatic activity/volume] in Serum or Plasma' : None,
    'Bicarbonate [Moles/volume] in Arterial blood':'Bicarbonate',
    'Bilirubin.direct [Mass/volume] in Serum or Plasma': None,
    'Bilirubin.total [Moles/volume] in Serum or Plasma' : None,
    'Calcium [Moles/volume] in Blood': 'Calcium',
    'Calcium.ionized [Moles/volume] in Blood': 'Calcium',
    'Carboxyhemoglobin/Hemoglobin.total in Arterial blood': 'Hemoglobin',
    'Chloride [Moles/volume] in Blood': 'Chloride', 
    'Core body temperature': 'Temperature',
    'Creatinine [Moles/volume] in Blood': 'Creatinine', 
    'Diastolic arterial pressure': 'Diastolic blood pressure',
    'Glucose [Moles/volume] in Serum or Plasma': 'Glucose', 
    'Heart rate': 'Heart Rate',
    'Hemoglobin [Mass/volume] in blood': 'Hemoglobin',
    'INR in Blood by Coagulation assay': 'Prothrombin time INR', 
    'Lactate [Mass/volume] in blood': 'Lactic acid',
    'Lymphocytes [#/volume] in Blood': 'Lymphocytes', 
    'Magnesium [Moles/volume] in Blood': 'Magnesium',
    'Methemoglobin/Hemoglobin.total in Arterial blood': 'Hemoglobin',
    'Neutrophils/100 leukocytes in Blood': 'Neutrophils', 
    'Peripheral oxygen saturation': 'Oxygen saturation',
    'Platelets [#/volume] in Blood': 'Platelets', 
    'Potassium [Moles/volume] in Blood': 'Potassium',
    'Pulmonary artery diastolic pressure': 'Diastolic blood pressure',
    'Pulmonary artery systolic pressure': 'Systolic blood pressure', 
    'Respiratory rate': 'Respiratory rate',
    'Sodium [Moles/volume] in Blood': 'Sodium', 
    'Systolic arterial pressure': 'Systolic blood pressure'
}
PavanReddy28 commented 1 year ago

Final Mapping

Many lab tests were excluded for the mentioned reasons.

hirid_mimic_feature_mapping = {
    'Alanine aminotransferase [Enzymatic activity/volume] in Serum or Plasma' : None, # Absent in MIMIC longitudanal data
    'Albumin [Mass/volume] in Serum or Plasma' : None, # Large difference in mean. Units are different? 
    'Amylase [Enzymatic activity/volume] in Serum or Plasma': None,  # Absent in MIMIC longitudanal data
    'Aspartate aminotransferase [Enzymatic activity/volume] in Serum or Plasma' : None,  # Absent in MIMIC longitudanal data
    'Bicarbonate [Moles/volume] in Arterial blood':'Bicarbonate',
    'Bilirubin.direct [Mass/volume] in Serum or Plasma': None,  # Absent in MIMIC longitudanal data
    'Bilirubin.total [Moles/volume] in Serum or Plasma' : None, # Absent in MIMIC longitudanal data
    'Calcium [Moles/volume] in Blood': None, # Large difference in mean. Units are different? 
    'Calcium.ionized [Moles/volume] in Blood': None, # Large difference in mean. not same labtest or Units are different? 
    'Carboxyhemoglobin/Hemoglobin.total in Arterial blood': None,  # Large difference in mean. Not same lab tests.
    'Chloride [Moles/volume] in Blood': 'Chloride', 
    'Core body temperature': 'Temperature',
    'Creatinine [Moles/volume] in Blood': None,  # Large difference in mean. Units are different? 
    'Diastolic arterial pressure': 'Diastolic blood pressure',
    'Glucose [Moles/volume] in Serum or Plasma': None, # Large difference in mean. Units are different? 
    'Heart rate': 'Heart Rate',
    'Hemoglobin [Mass/volume] in blood': None,   # Large difference in mean. Units are different? 
    'INR in Blood by Coagulation assay': 'Prothrombin time INR', 
    'Lactate [Mass/volume] in blood': 'Lactic acid',
    'Lymphocytes [#/volume] in Blood': 'Lymphocytes', 
    'Magnesium [Moles/volume] in Blood': 'Magnesium',
    'Methemoglobin/Hemoglobin.total in Arterial blood': None,  # Large difference in mean. Not same lab tests.
    'Neutrophils/100 leukocytes in Blood': None,  # Large difference in mean. Units are different? 
    'Peripheral oxygen saturation': 'Oxygen saturation',  
    'Platelets [#/volume] in Blood': 'Platelets', 
    'Potassium [Moles/volume] in Blood': 'Potassium',
    'Pulmonary artery diastolic pressure': None,  # Absent in MIMIC longitudanal data
    'Pulmonary artery systolic pressure': None, # Absent in MIMIC longitudanal data
    'Respiratory rate': 'Respiratory rate',
    'Sodium [Moles/volume] in Blood': 'Sodium', 
    'Systolic arterial pressure': 'Systolic blood pressure'
}
dancoster commented 1 year ago

Before generating the table please exclude inhuman values ('temp_mapping_140722_partial').

PavanReddy28 commented 1 year ago
hirid_mimic_mapping = {
    'Albumin [Mass/volume] in Serum or Plasma' : 'Albumin',
    'Bicarbonate [Moles/volume] in Arterial blood':'Bicarbonate',
    'Calcium [Moles/volume] in Blood': 'Calcium', 
    'Hemoglobin [Mass/volume] in blood': 'Hemoglobin',
    'Chloride [Moles/volume] in Blood': 'Chloride', 
    'Core body temperature': 'Temperature',
    'Creatinine [Moles/volume] in Blood': 'Creatinine',
    'Diastolic arterial pressure': 'Diastolic blood pressure',
    'Glucose [Moles/volume] in Serum or Plasma': 'Glucose',
    'Heart rate': 'Heart Rate',
    'INR in Blood by Coagulation assay': 'Prothrombin time INR', 
    'Lactate [Mass/volume] in blood': 'Lactic acid',
    'Lymphocytes [#/volume] in Blood': 'Lymphocytes', 
    'Magnesium [Moles/volume] in Blood': 'Magnesium',
    'Neutrophils/100 leukocytes in Blood': 'Neutrophils', 
    'Peripheral oxygen saturation': 'Oxygen saturation',  
    'Platelets [#/volume] in Blood': 'Platelets', 
    'Potassium [Moles/volume] in Blood': 'Potassium',
    'Respiratory rate': 'Respiratory rate',
    'Sodium [Moles/volume] in Blood': 'Sodium', 
    'Systolic arterial pressure': 'Systolic blood pressure'
}

Lab tests missing in hirid longitudinal data that are present in mimic longitudinal dataset.

{'Blood urea nitrogen',
 'Hematocrit',
 'Lactate dehydrogenase',
 'Mean corpuscular volume',
 'Red blood cell',
 'White blood cell count'}
dancoster commented 1 year ago
  1. For each clinical measure (vital/lab) in HiRID - generate a density plot + p-value to compare the distribtuion of the itemid of each clinical measure.
  2. For each clinical measure (vital/lab) - compare density plot between hirid and MIMIC (use unit conversion if needed).
  3. Dan will look for missing clinical measures.
  4. In HiRID lonigitudinal: (a) Please include only the features that are in MIMIC Longitudinal data (for example remove 'Pulmonary artery diastolic pressure'). (b) Please include only subject that were hospitalized for at least 48h.
  5. If the dist. are different (visually and statistically) create a function that maps HiRID units to the units of MIMIC.
  6. There is too little number of Albumin measurements - please double check.
dancoster commented 1 year ago

I'll review the missing clinical features in HiRid after completion of rest of the tasks

PavanReddy28 commented 1 year ago

Inhumane values missing for the following lab tests:

  1. Temperature
  2. Diastolic blood pressure
  3. Heart Rate
  4. Lactic acid
  5. Oxygen saturation
  6. Respiratory rate
  7. Systolic blood pressure

Comparison between HIRID and MIMIC Extract columns

Units that are not matching (based on mean and density plots):

  1. Albumin,
  2. Calicium,
  3. Creatinine,
  4. Glucose (HiRiD Mean is very low (20)),
  5. Lymphocytes,
  6. Magnesium,
  7. Neutrophils
dancoster commented 1 year ago

Do you need my assistance with the unmatched features? (Albumin, Calicium, Creatinine, Glucose, Lymphocytes, Magnesium, Neutrophils).

Here is the status of the tasks: 1+5. For each clinical measure (vital/lab) in HiRID - generate a density plot + p-value to compare the distribtuion of the itemid of each clinical measure.

  1. For each clinical measure (vital/lab) - compare density plot between hirid and MIMIC (use unit conversion if needed). Dan will look for missing clinical measures. If the dist. are different (visually and statistically) create a function that maps HiRID units to the units of MIMIC. - In progress
  2. In HiRID lonigitudinal: (a) Please include only the features that are in MIMIC Longitudinal data (for example remove 'Pulmonary artery diastolic pressure'). Please write what are the features which are not in the intersection of MIMIC and HiRID and I'll validate that they are indeed don't exist. (b) Please include only subject that were hospitalized for at least 48h.
  3. There is too little number of Albumin measurements - please double check.
PavanReddy28 commented 1 year ago

Comparison between HiRiD and MIMIC Longitudinal dataset

I have added unit conversions for most of the features. The following needs to be looked into more deeply (Dan's Review) Lymphocytes MIMIC Mean = 13.511912675286833 HIRID Mean = 0.18385518590998043 image Magnesium MIMIC Mean = 2.039457406430662 HIRID Mean = 0.368658508674493 image Neutrophils MIMIC Mean = 77.99791328032335 HIRID Mean = 190.20241175276928 image Respiratory rate Inhumane values missing for Respiratory rate MIMIC Mean = 18.968796364866186 HIRID Mean = 61.39900703518324 image Albumin MIMIC Mean = 3.0655902004454343 HIRID Mean = 0.8052631578947369 image

HIRID Longitudinal Data

  1. Columns in hirid_extract missing in mimic_extract which were removed
    set(hirid_extract.columns).difference(mimic_extract.columns)
    {'Alanine aminotransferase [Enzymatic activity[/volume](https://file+.vscode-resource.vscode-cdn.net/volume)] in Serum or Plasma',
    'Amylase [Enzymatic activity[/volume](https://file+.vscode-resource.vscode-cdn.net/volume)] in Serum or Plasma',
    'Aspartate aminotransferase [Enzymatic activity[/volume](https://file+.vscode-resource.vscode-cdn.net/volume)] in Serum or Plasma',
    'Bilirubin.direct [Mass[/volume](https://file+.vscode-resource.vscode-cdn.net/volume)] in Serum or Plasma',
    'Bilirubin.total [Moles[/volume](https://file+.vscode-resource.vscode-cdn.net/volume)] in Serum or Plasma',
    'CHARTTIME',
    'Calcium.ionized [Moles[/volume](https://file+.vscode-resource.vscode-cdn.net/volume)] in Blood',
    'Carboxyhemoglobin[/Hemoglobin.total](https://file+.vscode-resource.vscode-cdn.net/Hemoglobin.total) in Arterial blood',
    'DATE',
    'EST_DISCHTIME',
    'HOUR',
    'Methemoglobin[/Hemoglobin.total](https://file+.vscode-resource.vscode-cdn.net/Hemoglobin.total) in Arterial blood',
    'Pulmonary artery diastolic pressure',
    'Pulmonary artery systolic pressure',
    'discharge_status'}
  2. Columns in mimic_extract missing in hirid_extract which were removed
    set(mimic_extract.columns).difference(hirid_extract.columns) :
    {'Blood urea nitrogen',
    'DEATHTIME',
    'ETHNICITY',
    'Hematocrit',
    'Lactate dehydrogenase',
    'Mean corpuscular volume',
    'Red blood cell',
    'White blood cell count',
    'icustay_id',
    'subject_id'}

Final set of columns (same in both mimc and hirid extracts):

['hadm_id', 'age', 'GENDER', 'Albumin', 'Bicarbonate', 'Calcium',
       'Chloride', 'Temperature', 'Creatinine', 'Diastolic blood pressure',
       'Glucose', 'Heart Rate', 'Hemoglobin', 'Prothrombin time INR',
       'Lactic acid', 'Lymphocytes', 'Magnesium', 'Neutrophils',
       'Oxygen saturation', 'Platelets', 'Potassium', 'Respiratory rate',
       'Sodium', 'Systolic blood pressure', 'DISCHTIME', 'charttime',
       'LABEL_48', 'ADMITTIME', 'Mortality', 'LOS']

MIMIC Longitudinal Data - https://drive.google.com/open?id=1WuO1OJZZyvd4RwM0hTWk-FIDJlCnDduk&usp=drive_fs HIRID Longitudinal Data - https://drive.google.com/open?id=1Wyp9CMP6aiPJb6_O75VyswTL0hWlmlWC&usp=drive_fs