bugs in 4_write_omop.py or possibly the extracted data.

mostafaalishahi commented 10 months ago

Hi again, I am wondering how long this script (4_write_omop.py) would take to run (took me more than 24 hours) and it resulted into the following error after 24 hours appreciate any feedbacks to avoid rerunning the script and get it running from where it left off. What would chunk X/100 mean here. Is it normal that I got 1890 out of 100 chunks?


Chunk 1890/100
collecting weight
collecting heart_rate
collecting invasive_systolic_blood_pressure   
collecting invasive_diastolic_blood_pressure  
collecting invasive_mean_blood_pressure
collecting noninvasive_systolic_blood_pressure
collecting noninvasive_diastolic_blood_pressure
collecting noninvasive_mean_blood_pressure
collecting O2_saturation
collecting lactate
collecting blood_glucose
collecting magnesium
collecting sodium
collecting creatinine
collecting calcium
collecting temperature
collecting FiO2
collecting hemoglobin
collecting chloride
collecting pH
collecting paO2
collecting paCO2
collecting plateau_pressure
collecting respiratory_rate_setting
collecting tidal_volume_setting
collecting potassium
collecting PTT
collecting bilirubine
collecting alanine_aminotransferase
collecting aspartate_aminotransferase
collecting respiratory_rate
collecting albumin
collecting blood_urea_nitrogen
collecting expiratory_tidal_volume
collecting white_blood_cells
collecting platelets               
collecting phosphate
collecting bicarbonate
collecting alkaline_phosphatase
collecting PEEP
collecting urine_output
collecting glasgow_coma_score
collecting glasgow_coma_score_eye
Traceback (most recent call last):
File "BlendedICU/4_write_omop.py", line 14, in <module>
  c.measurement_table()
  File "BlendedICU/blended_preprocessing/omop_conversion.py", line 331, in measurement_table
  self.measurement = self._add_measurement(varname,
  File "BlendedICU/blended_preprocessing/omop_conversion.py", line 270, in _add_measurement
  vals = timeseries.loc[:, ['time', 'patient', varname]]
  File ".conda/envs/blended/lib/python3.9/site-packages/pandas/core/indexing.py", line 1147, in __getitem__
  return self._getitem_tuple(key)
  File ".conda/envs/blended/lib/python3.9/site-packages/pandas/core/indexing.py", line 1339, in _getitem_tuple
  return self._getitem_tuple_same_dim(tup)
  File ".conda/envs/blended/lib/python3.9/site-packages/pandas/core/indexing.py", line 994, in _getitem_tuple_same_dim
  retval = getattr(retval, self.name)._getitem_axis(key, axis=i)
  File ".conda/envs/blended/lib/python3.9/site-packages/pandas/core/indexing.py", line 1382, in _getitem_axis
  return self._getitem_iterable(key, axis=axis)
  File ".conda/envs/blended/lib/python3.9/site-packages/pandas/core/indexing.py", line 1322, in _getitem_iterable
  keyarr, indexer = self._get_listlike_indexer(key, axis)
  File ".conda/envs/blended/lib/python3.9/site-packages/pandas/core/indexing.py", line 1520, in _get_listlike_indexer
  keyarr, indexer = ax._get_indexer_strict(key, axis_name)
  File ".conda/envs/blended/lib/python3.9/site-packages/pandas/core/indexes/base.py", line 6114, in _get_indexer_strict
  self._raise_if_missing(keyarr, indexer, axis_name)
  File ".conda/envs/blended/lib/python3.9/site-packages/pandas/core/indexes/base.py", line 6178, in _raise_if_missing
raise KeyError(f"{not_found} not in index")
KeyError: "['glasgow_coma_score_eye'] not in index"

USM-CHU-FGuyon commented 10 months ago

Hi, I need to check this, I haven't used the OMOP formatted data lately. Writing to OMOP should take a long time, but not that long.

In the meantime, the BlendedICU data can be used in its original form with medications and timeseries as patient-level parquet files located in blended_data/formatted_medications/ and blended_data/formatted_timeseries/ and labels/flat data in blended_data/preprocessed_labels.parquet and blended_data/preprocessed_flat.parquet.

Thanks again for the feedback, getting back to you soon.

mostafaalishahi commented 10 months ago

Hi, could you please tell me if this is normal that I am getting 1890 out of 100 chunks? what X refers to in X/100 chunks print out line.

Thanks,

USM-CHU-FGuyon commented 10 months ago

Hi, There were several issues introduced by "quick changes" made during reviewing process. I'm done fixing them and now this step runs in ~10hours. I will push the changes tomorrow.

1890 out of 100 was a printing error... there was a confusion between the number of patients per chunk and the number of chunks. Now there will actually be 100 chunks of ~3200 patients and not the opposite.
Measurement data was mostly filled with NaNs because this step initially assumed that timeseries were resampled to hourly data, which is not the case anymore. This was fixed, and the data is a LOT smaller.
The KeyError that you had was fixed : if a variable cannot be found (ie if a batch of patients did not have this variable), it is skipped and will not appear in the corresponding measurement chunk.
I also added the option to start again from chunk X

Thank you for your patience, Getting back to you (very) soon

USM-CHU-FGuyon commented 10 months ago

Hi I did test it on my side, it runs with v0.1.5. You have to start 4_write_OMOP from chunk 0, there will actually be 100 chunks now. Note that measurement_table and drug_exposure_table can be launched in parallel for a slight speedup.

mostafaalishahi commented 10 months ago

Many thanks.

USM-CHU-FGuyon / BlendedICU

bugs in 4_write_omop.py or possibly the extracted data. #11