USM-CHU-FGuyon / BlendedICU

OMOP standardization pipeline for ICU databases
MIT License
23 stars 6 forks source link

data/amsterdam_data/preprocessed_labels.parquet does not exist. #10

Closed mostafaalishahi closed 7 months ago

mostafaalishahi commented 7 months ago

Hi,

I am trying to replicate your amazing work, however after running 3_blendedICU.py script I am getting the following error: Any help would be much appreciated. FileNotFoundError: [Errno 2] No such file or directory: './data/amsterdam_data/preprocessed_labels.parquet'

USM-CHU-FGuyon commented 7 months ago

Hi, Thank you for reporting this issue. I should be able to fix this soon, could you add the full stacktrace of the error ? This may be because you have not run the preprocessing steps for Amsterdam; In that case, removing "amsterdam" from the listed datasets in 3_BlendedICU.py should solve the error:


flp = blended_FLProcessor(datasets=['mimic3',
                                    'hirid',
                                    'amsterdam', #remove the datasets for which preprocessing was not run
                                    'mimic',
                                    'eicu'])
mostafaalishahi commented 7 months ago

Thank you for your reply, I have done the preprocessing steps for Amsterdam, hirid, MIMIC-IV, and eICU. By the way, I think for Amsterdam and eICU you do not write any parquet file named preprocessed_labels, because after running 1_extract_amsterdam.py, 2_amsterdam.py and 1_extract_eicu.py and 2_eicu.py I do not get any file named as preprocessed_labels.parquet whereas for MIMIC-IV and hirid I have preprocessed_labels.parquet file included in the extracted data.

Please find below the full stacktrace of the error.

python 3_blendedICU.py 
Loading ./data/amsterdam_data/preprocessed_labels.parquet
Traceback (most recent call last):
  File "BlendedICU/3_blendedICU.py", line 13, in <module>
    flp = blended_FLProcessor(datasets=['amsterdam','hirid','mimic','eicu'])#'mimic3',
  File "BlendedICU/blended_preprocessing/flat_and_labels.py", line 19, in __init__
    self.labels = self._load_labels()
  File "BlendedICU/blended_preprocessing/flat_and_labels.py", line 27, in _load_labels
    return {d: self.load(p).reset_index() for d, p in labels_pths.items()}
  File "BlendedICU/blended_preprocessing/flat_and_labels.py", line 27, in <dictcomp>
    return {d: self.load(p).reset_index() for d, p in labels_pths.items()}
  File "BlendedICU/database_processing/dataprocessor.py", line 111, in load
    return pd.read_parquet(pth, **kwargs)
  File ".conda/envs/blended/lib/python3.9/site-packages/pandas/io/parquet.py", line 670, in read_parquet
    return impl.read(
  File ".conda/envs/blended/lib/python3.9/site-packages/pandas/io/parquet.py", line 265, in read
    path_or_handle, handles, filesystem = _get_path_or_handle(
  File ".conda/envs/blended/lib/python3.9/site-packages/pandas/io/parquet.py", line 139, in _get_path_or_handle
    handles = get_handle(
  File ".conda/envs/blended/lib/python3.9/site-packages/pandas/io/common.py", line 872, in get_handle
    handle = open(handle, ioargs.mode)
FileNotFoundError: [Errno 2] No such file or directory: './data/amsterdam_data/preprocessed_labels.parquet'
USM-CHU-FGuyon commented 7 months ago

There was likely an issue when running step 2. What do you get when running this snippet ?

from amsterdam_preprocessing.timeseries import amsterdamTSP
from amsterdam_preprocessing.flat_and_labels import Ams_FLProcessor

flp = Ams_FLProcessor()
flp.run_labels()

The output should tell you where preprocessed_labels.parquet was saved :

Loading D:/BLENDED_ICU_2/amsterdam_data/amsterdam_parquet//labels.parquet
Initial number of admissions : 22882
number of admissions after preprocessing : 20593
   saving D:/BLENDED_ICU_2/amsterdam_data/preprocessed_labels.parquet

(+ some pyarrow warning due to the latest pandas version)

mostafaalishahi commented 7 months ago

Thanks, you are right there was an issue running step 2. I just re-ran it and it is fine so far.