Open dancoster opened 1 year ago
Generated HIRID_EXTRACT with most of the above lab tests as columns for (10,20) part.
No death time available in HIRID.
Shared the merged file - HIRID Extract
Hi, I reviewed the file. Looks good, few comments:
2115-07-27 19:00:00+00:00
).Please create a function that maps the units to the units of MIMIC (I shared with you the data file of mimic via gmail):
Please compare (deinsity plot + p value of t-test) and merge Arterial and Venous blood to one col:
Please compare (deinsity plot + p value of t-test) and merge: Invasive diastolic arterial pressure & Non-invasive diastolic arterial pressure Invasive systolic arterial pressure & Non-invasive systolic arterial pressure
Why did you include the col: 'Metronidazole tabl 200 mg'? It seems like a drug
Send the density plots for the following:
TtestResult(statistic=120.05747587932802, pvalue=0.0, df=39808)
TtestResult(statistic=8.601894346207528, pvalue=1.77624076255818e-17, df=1670)
TtestResult(statistic=6.114071547372929, pvalue=9.785966360147038e-10, df=48505)
TtestResult(statistic=31.800783697879396, pvalue=1.1542069839654229e-219, df=48513)
Final HiRiD Longitudinal dataset - https://drive.google.com/file/d/1Txe2vKBsFkoeuQdsWPUwq1NLy4S1HUbL/view?usp=sharing
Need to filter out some columns from HIRID Longitudinal data based on the below comparision table.
Note: Used the following dictionary to map hirid and mimic features.
hirid_mapping = {
'Alanine aminotransferase [Enzymatic activity/volume] in Serum or Plasma' : None,
'Albumin [Mass/volume] in Serum or Plasma' : 'Albumin',
'Amylase [Enzymatic activity/volume] in Serum or Plasma': None,
'Aspartate aminotransferase [Enzymatic activity/volume] in Serum or Plasma' : None,
'Bicarbonate [Moles/volume] in Arterial blood':'Bicarbonate',
'Bilirubin.direct [Mass/volume] in Serum or Plasma': None,
'Bilirubin.total [Moles/volume] in Serum or Plasma' : None,
'Calcium [Moles/volume] in Blood': 'Calcium',
'Calcium.ionized [Moles/volume] in Blood': 'Calcium',
'Carboxyhemoglobin/Hemoglobin.total in Arterial blood': 'Hemoglobin',
'Chloride [Moles/volume] in Blood': 'Chloride',
'Core body temperature': 'Temperature',
'Creatinine [Moles/volume] in Blood': 'Creatinine',
'Diastolic arterial pressure': 'Diastolic blood pressure',
'Glucose [Moles/volume] in Serum or Plasma': 'Glucose',
'Heart rate': 'Heart Rate',
'Hemoglobin [Mass/volume] in blood': 'Hemoglobin',
'INR in Blood by Coagulation assay': 'Prothrombin time INR',
'Lactate [Mass/volume] in blood': 'Lactic acid',
'Lymphocytes [#/volume] in Blood': 'Lymphocytes',
'Magnesium [Moles/volume] in Blood': 'Magnesium',
'Methemoglobin/Hemoglobin.total in Arterial blood': 'Hemoglobin',
'Neutrophils/100 leukocytes in Blood': 'Neutrophils',
'Peripheral oxygen saturation': 'Oxygen saturation',
'Platelets [#/volume] in Blood': 'Platelets',
'Potassium [Moles/volume] in Blood': 'Potassium',
'Pulmonary artery diastolic pressure': 'Diastolic blood pressure',
'Pulmonary artery systolic pressure': 'Systolic blood pressure',
'Respiratory rate': 'Respiratory rate',
'Sodium [Moles/volume] in Blood': 'Sodium',
'Systolic arterial pressure': 'Systolic blood pressure'
}
Many lab tests were excluded for the mentioned reasons.
hirid_mimic_feature_mapping = {
'Alanine aminotransferase [Enzymatic activity/volume] in Serum or Plasma' : None, # Absent in MIMIC longitudanal data
'Albumin [Mass/volume] in Serum or Plasma' : None, # Large difference in mean. Units are different?
'Amylase [Enzymatic activity/volume] in Serum or Plasma': None, # Absent in MIMIC longitudanal data
'Aspartate aminotransferase [Enzymatic activity/volume] in Serum or Plasma' : None, # Absent in MIMIC longitudanal data
'Bicarbonate [Moles/volume] in Arterial blood':'Bicarbonate',
'Bilirubin.direct [Mass/volume] in Serum or Plasma': None, # Absent in MIMIC longitudanal data
'Bilirubin.total [Moles/volume] in Serum or Plasma' : None, # Absent in MIMIC longitudanal data
'Calcium [Moles/volume] in Blood': None, # Large difference in mean. Units are different?
'Calcium.ionized [Moles/volume] in Blood': None, # Large difference in mean. not same labtest or Units are different?
'Carboxyhemoglobin/Hemoglobin.total in Arterial blood': None, # Large difference in mean. Not same lab tests.
'Chloride [Moles/volume] in Blood': 'Chloride',
'Core body temperature': 'Temperature',
'Creatinine [Moles/volume] in Blood': None, # Large difference in mean. Units are different?
'Diastolic arterial pressure': 'Diastolic blood pressure',
'Glucose [Moles/volume] in Serum or Plasma': None, # Large difference in mean. Units are different?
'Heart rate': 'Heart Rate',
'Hemoglobin [Mass/volume] in blood': None, # Large difference in mean. Units are different?
'INR in Blood by Coagulation assay': 'Prothrombin time INR',
'Lactate [Mass/volume] in blood': 'Lactic acid',
'Lymphocytes [#/volume] in Blood': 'Lymphocytes',
'Magnesium [Moles/volume] in Blood': 'Magnesium',
'Methemoglobin/Hemoglobin.total in Arterial blood': None, # Large difference in mean. Not same lab tests.
'Neutrophils/100 leukocytes in Blood': None, # Large difference in mean. Units are different?
'Peripheral oxygen saturation': 'Oxygen saturation',
'Platelets [#/volume] in Blood': 'Platelets',
'Potassium [Moles/volume] in Blood': 'Potassium',
'Pulmonary artery diastolic pressure': None, # Absent in MIMIC longitudanal data
'Pulmonary artery systolic pressure': None, # Absent in MIMIC longitudanal data
'Respiratory rate': 'Respiratory rate',
'Sodium [Moles/volume] in Blood': 'Sodium',
'Systolic arterial pressure': 'Systolic blood pressure'
}
Before generating the table please exclude inhuman values ('temp_mapping_140722_partial').
hirid_mimic_mapping = {
'Albumin [Mass/volume] in Serum or Plasma' : 'Albumin',
'Bicarbonate [Moles/volume] in Arterial blood':'Bicarbonate',
'Calcium [Moles/volume] in Blood': 'Calcium',
'Hemoglobin [Mass/volume] in blood': 'Hemoglobin',
'Chloride [Moles/volume] in Blood': 'Chloride',
'Core body temperature': 'Temperature',
'Creatinine [Moles/volume] in Blood': 'Creatinine',
'Diastolic arterial pressure': 'Diastolic blood pressure',
'Glucose [Moles/volume] in Serum or Plasma': 'Glucose',
'Heart rate': 'Heart Rate',
'INR in Blood by Coagulation assay': 'Prothrombin time INR',
'Lactate [Mass/volume] in blood': 'Lactic acid',
'Lymphocytes [#/volume] in Blood': 'Lymphocytes',
'Magnesium [Moles/volume] in Blood': 'Magnesium',
'Neutrophils/100 leukocytes in Blood': 'Neutrophils',
'Peripheral oxygen saturation': 'Oxygen saturation',
'Platelets [#/volume] in Blood': 'Platelets',
'Potassium [Moles/volume] in Blood': 'Potassium',
'Respiratory rate': 'Respiratory rate',
'Sodium [Moles/volume] in Blood': 'Sodium',
'Systolic arterial pressure': 'Systolic blood pressure'
}
Lab tests missing in hirid longitudinal data that are present in mimic longitudinal dataset.
{'Blood urea nitrogen',
'Hematocrit',
'Lactate dehydrogenase',
'Mean corpuscular volume',
'Red blood cell',
'White blood cell count'}
I'll review the missing clinical features in HiRid after completion of rest of the tasks
Inhumane values missing for the following lab tests:
Units that are not matching (based on mean and density plots):
Do you need my assistance with the unmatched features? (Albumin, Calicium, Creatinine, Glucose, Lymphocytes, Magnesium, Neutrophils).
Here is the status of the tasks: 1+5. For each clinical measure (vital/lab) in HiRID - generate a density plot + p-value to compare the distribtuion of the itemid of each clinical measure.
I have added unit conversions for most of the features. The following needs to be looked into more deeply (Dan's Review)
Lymphocytes
MIMIC Mean = 13.511912675286833
HIRID Mean = 0.18385518590998043
Magnesium
MIMIC Mean = 2.039457406430662
HIRID Mean = 0.368658508674493
Neutrophils
MIMIC Mean = 77.99791328032335
HIRID Mean = 190.20241175276928
Respiratory rate
Inhumane values missing for Respiratory rate
MIMIC Mean = 18.968796364866186
HIRID Mean = 61.39900703518324
Albumin
MIMIC Mean = 3.0655902004454343
HIRID Mean = 0.8052631578947369
hirid_extract
missing in mimic_extract
which were removed
set(hirid_extract.columns).difference(mimic_extract.columns)
{'Alanine aminotransferase [Enzymatic activity[/volume](https://file+.vscode-resource.vscode-cdn.net/volume)] in Serum or Plasma',
'Amylase [Enzymatic activity[/volume](https://file+.vscode-resource.vscode-cdn.net/volume)] in Serum or Plasma',
'Aspartate aminotransferase [Enzymatic activity[/volume](https://file+.vscode-resource.vscode-cdn.net/volume)] in Serum or Plasma',
'Bilirubin.direct [Mass[/volume](https://file+.vscode-resource.vscode-cdn.net/volume)] in Serum or Plasma',
'Bilirubin.total [Moles[/volume](https://file+.vscode-resource.vscode-cdn.net/volume)] in Serum or Plasma',
'CHARTTIME',
'Calcium.ionized [Moles[/volume](https://file+.vscode-resource.vscode-cdn.net/volume)] in Blood',
'Carboxyhemoglobin[/Hemoglobin.total](https://file+.vscode-resource.vscode-cdn.net/Hemoglobin.total) in Arterial blood',
'DATE',
'EST_DISCHTIME',
'HOUR',
'Methemoglobin[/Hemoglobin.total](https://file+.vscode-resource.vscode-cdn.net/Hemoglobin.total) in Arterial blood',
'Pulmonary artery diastolic pressure',
'Pulmonary artery systolic pressure',
'discharge_status'}
mimic_extract
missing in hirid_extract
which were removed
set(mimic_extract.columns).difference(hirid_extract.columns) :
{'Blood urea nitrogen',
'DEATHTIME',
'ETHNICITY',
'Hematocrit',
'Lactate dehydrogenase',
'Mean corpuscular volume',
'Red blood cell',
'White blood cell count',
'icustay_id',
'subject_id'}
Final set of columns (same in both mimc and hirid extracts):
['hadm_id', 'age', 'GENDER', 'Albumin', 'Bicarbonate', 'Calcium',
'Chloride', 'Temperature', 'Creatinine', 'Diastolic blood pressure',
'Glucose', 'Heart Rate', 'Hemoglobin', 'Prothrombin time INR',
'Lactic acid', 'Lymphocytes', 'Magnesium', 'Neutrophils',
'Oxygen saturation', 'Platelets', 'Potassium', 'Respiratory rate',
'Sodium', 'Systolic blood pressure', 'DISCHTIME', 'charttime',
'LABEL_48', 'ADMITTIME', 'Mortality', 'LOS']
MIMIC Longitudinal Data - https://drive.google.com/open?id=1WuO1OJZZyvd4RwM0hTWk-FIDJlCnDduk&usp=drive_fs HIRID Longitudinal Data - https://drive.google.com/open?id=1Wyp9CMP6aiPJb6_O75VyswTL0hWlmlWC&usp=drive_fs
Take mean of hourly measures.
lab measurements
vital_signs = ['Heart Rate', 'Respiratory rate','Oxygen saturation', 'Systolic blood pressure', 'Diastolic blood pressure', 'Temperature'] labs_bmp = ['Glucose','Potassium','Sodium','Chloride', 'Creatinine', 'Blood urea nitrogen', 'Bicarbonate', 'Calcium', 'Albumin', 'Lactate dehydrogenase','Magnesium','Lactic acid'] labs_cbc = ['Hematocrit','Hemoglobin', 'Platelets', 'White blood cell count', 'Red blood cell count', 'Mean corpuscular volume', 'Lymphocytes', 'Neutrophils'] labs_cauglation = ['Prothrombin time INR']