ds4dh / medical_concept_representation

2 stars 0 forks source link

errors occurs in medical_concept_representation-"process_mimic.py" #1

Open AIforGenomics opened 3 months ago

AIforGenomics commented 3 months ago

Dear college:

I used the "process_mimic.py" to generate sequence data, but the following issues was occured, can you give me some advices.

First, I have install the recommanded enviroment.

Then, I run the command "python data/datasets/mimic-iv-2.2/process_mimic.py", but erros occured as follows:

" PRO_MAP.icd10pcs[PRO_MAP.icd10pcs == 'NoPCS'] =\ Process SpawnProcess-1: Traceback (most recent call last): Traceback (most recent call last): File "E:\Anaconda\envs\ehr\Lib\multiprocessing\queues.py", line 266, in _feed send_bytes(obj) File "E:\Anaconda\envs\ehr\Lib\multiprocessing\connection.py", line 200, in send_bytes self._send_bytes(m[offset:offset + size]) File "E:\Anaconda\envs\ehr\Lib\multiprocessing\connection.py", line 289, in _send_bytes ov, err = _winapi.WriteFile(self._handle, buf, overlapped=True) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ OSError: [WinError 87] 参数错误。 File "E:\Anaconda\envs\ehr\Lib\multiprocessing\process.py", line 314, in _bootstrap self.run() File "E:\Anaconda\envs\ehr\Lib\multiprocessing\process.py", line 108, in run self._target(*self._args, self._kwargs) File "E:\Anaconda\envs\ehr\Lib\concurrent\futures\process.py", line 251, in _process_worker call_item = call_queue.get(block=True) ^^^^^^^^^^^^^^^^^^^^^^^^^^ File "E:\Anaconda\envs\ehr\Lib\multiprocessing\queues.py", line 103, in get res = self._recv_bytes() ^^^^^^^^^^^^^^^^^^ File "E:\Anaconda\envs\ehr\Lib\multiprocessing\connection.py", line 216, in recv_bytes buf = self._recv_bytes(maxlength) ^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "E:\Anaconda\envs\ehr\Lib\multiprocessing\connection.py", line 334, in _recv_bytes return self._get_more_data(ov, maxsize) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "E:\Anaconda\envs\ehr\Lib\multiprocessing\connection.py", line 353, in _get_more_data assert left > 0 ^^^^^^^^ AssertionError Traceback (most recent call last): File "E:\Anaconda\envs\ehr\Lib\multiprocessing\queues.py", line 266, in _feed send_bytes(obj) File "E:\Anaconda\envs\ehr\Lib\multiprocessing\connection.py", line 184, in send_bytes self._check_closed() File "E:\Anaconda\envs\ehr\Lib\multiprocessing\connection.py", line 137, in _check_closed raise OSError("handle is closed") OSError: handle is closed Traceback (most recent call last): File "E:\Anaconda\envs\ehr\Lib\multiprocessing\queues.py", line 266, in _feed send_bytes(obj) File "E:\Anaconda\envs\ehr\Lib\multiprocessing\connection.py", line 184, in send_bytes self._check_closed() File "E:\Anaconda\envs\ehr\Lib\multiprocessing\connection.py", line 137, in _check_closed raise OSError("handle is closed") OSError: handle is closed Traceback (most recent call last): File "E:\Anaconda\envs\ehr\Lib\multiprocessing\queues.py", line 266, in _feed send_bytes(obj) File "E:\Anaconda\envs\ehr\Lib\multiprocessing\connection.py", line 184, in send_bytes self._check_closed() File "E:\Anaconda\envs\ehr\Lib\multiprocessing\connection.py", line 137, in _check_closed raise OSError("handle is closed") OSError: handle is closed Traceback (most recent call last): File "E:\Anaconda\envs\ehr\Lib\multiprocessing\queues.py", line 266, in _feed send_bytes(obj) File "E:\Anaconda\envs\ehr\Lib\multiprocessing\connection.py", line 184, in send_bytes self._check_closed() File "E:\Anaconda\envs\ehr\Lib\multiprocessing\connection.py", line 137, in _check_closed raise OSError("handle is closed") OSError: handle is closed Traceback (most recent call last): File "E:\Anaconda\envs\ehr\Lib\multiprocessing\queues.py", line 266, in _feed send_bytes(obj) File "E:\Anaconda\envs\ehr\Lib\multiprocessing\connection.py", line 184, in send_bytes self._check_closed() File "E:\Anaconda\envs\ehr\Lib\multiprocessing\connection.py", line 137, in _check_closed raise OSError("handle is closed") OSError: handle is closed Traceback (most recent call last): File "E:\Anaconda\envs\ehr\Lib\multiprocessing\queues.py", line 266, in _feed send_bytes(obj) File "E:\Anaconda\envs\ehr\Lib\multiprocessing\connection.py", line 184, in send_bytes self._check_closed() File "E:\Anaconda\envs\ehr\Lib\multiprocessing\connection.py", line 137, in _check_closed raise OSError("handle is closed") OSError: handle is closed Traceback (most recent call last): File "E:\Anaconda\envs\ehr\Lib\multiprocessing\queues.py", line 266, in _feed send_bytes(obj) File "E:\Anaconda\envs\ehr\Lib\multiprocessing\connection.py", line 184, in send_bytes self._check_closed() File "E:\Anaconda\envs\ehr\Lib\multiprocessing\connection.py", line 137, in _check_closed raise OSError("handle is closed") OSError: handle is closed Traceback (most recent call last): File "C:\Users\wangchao\Desktop\medical_concept_representation-main\data\datasets\mimic-iv-2.2\process_mimic.py", line 139, in main() File "C:\Users\wangchao\Desktop\medical_concept_representation-main\data\datasets\mimic-iv-2.2\process_mimic.py", line 68, in main process_map( File "E:\Anaconda\envs\ehr\Lib\site-packages\tqdm\contrib\concurrent.py", line 105, in process_map return _executor_map(ProcessPoolExecutor, fn, *iterables, *tqdm_kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "E:\Anaconda\envs\ehr\Lib\site-packages\tqdm\contrib\concurrent.py", line 51, in _executor_map return list(tqdm_class(ex.map(fn, iterables, chunksize=chunksize), kwargs)) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "E:\Anaconda\envs\ehr\Lib\concurrent\futures\process.py", line 859, in map results = super().map(partial(_process_chunk, fn), ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "E:\Anaconda\envs\ehr\Lib\concurrent\futures_base.py", line 608, in map fs = [self.submit(fn, args) for args in zip(iterables)] ^^^^^^^^^^^^^^^^^^^^^^ File "E:\Anaconda\envs\ehr\Lib\concurrent\futures\process.py", line 813, in submit raise BrokenProcessPool(self._broken) concurrent.futures.process.BrokenProcessPool: A child process terminated abruptly, the process pool is not usable anymore"

Thanks Chao Wang

albornet commented 3 months ago

Hello,

thanks for the comment! Indeed I checked and I let a few mistakes in the code.

I think the possible reasons for the error are

1) I forgot to remove a debug line ("import pdb; pdb.set_trace()" in data/datasets/mimic-iv-2.2/mimic_utils.py (oops). It will crash the code when it's called. 2) I inadvertently used ".csv" instead of ".csv.gz" for one mimic file loaded in data/datasets/mimic-iv-2.2/load_hosp_data.py. If the data files are not unzipped, the .csv will not be found. 3) Also, I was using "tqdm.contrib.concurrent.process_map" for the multiprocessing in data/datasets/mimic-iv-2.2/process_mimic.py, which might be less stable than the classic multiprocessing.pool.imap.

I updated the code to solve these potential issues (I removed the pdb line, I now load the ".csv.gz" instead of ".csv", and I replaced tqdm process_map by the more classic multiprocesing.pool.imap.

I pushed these changes: https://github.com/ds4dh/medical_concept_representation/commit/5819705ac6d877dc9154755f619103f238e0934c

Hope this solves this issue! Let me know if not (it could be also related to the amount of memory you have on your system? in that case you could try to decrease chunksize here: https://github.com/ds4dh/medical_concept_representation/blob/5819705ac6d877dc9154755f619103f238e0934c/data/datasets/mimic-iv-2.2/process_mimic.py#L68 ).

Best, Alban

Le dim. 26 mai 2024 à 10:36, Chao Wang @.***> a écrit :

Dear college:

I used the "process_mimic.py" to generate sequence data, but the following issues was occured, can you give me some advices.

First, I have install the recommanded enviroment. " Downloading and Extracting Packages:

Preparing transaction: done Verifying transaction: done Executing transaction: done

Successfully installed all packages in medical_representation. Don't forget to activate it by using the following command:

conda activate medical_representation "

Then, I run the command "python data/datasets/mimic-iv-2.2/process_mimic.py", but erros occured as follows:

" PRO_MAP.icd10pcs[PRO_MAP.icd10pcs == 'NoPCS'] = Process SpawnProcess-1: Traceback (most recent call last): Traceback (most recent call last): File "E:\Anaconda\envs\ehr\Lib\multiprocessing\queues.py", line 266, in _feed send_bytes(obj) File "E:\Anaconda\envs\ehr\Lib\multiprocessing\connection.py", line 200, in send_bytes self._send_bytes(m[offset:offset + size]) File "E:\Anaconda\envs\ehr\Lib\multiprocessing\connection.py", line 289, in _send_bytes ov, err = _winapi.WriteFile(self._handle, buf, overlapped=True) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ OSError: [WinError 87] 参数错误。 File "E:\Anaconda\envs\ehr\Lib\multiprocessing\process.py", line 314, in _bootstrap self.run() File "E:\Anaconda\envs\ehr\Lib\multiprocessing\process.py", line 108, in run self._target(*self._args, self._kwargs) File "E:\Anaconda\envs\ehr\Lib\concurrent\futures\process.py", line 251, in _process_worker call_item = call_queue.get(block=True) ^^^^^^^^^^^^^^^^^^^^^^^^^^ File "E:\Anaconda\envs\ehr\Lib\multiprocessing\queues.py", line 103, in get res = self._recv_bytes() ^^^^^^^^^^^^^^^^^^ File "E:\Anaconda\envs\ehr\Lib\multiprocessing\connection.py", line 216, in recv_bytes buf = self._recv_bytes(maxlength) ^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "E:\Anaconda\envs\ehr\Lib\multiprocessing\connection.py", line 334, in _recv_bytes return self._get_more_data(ov, maxsize) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "E:\Anaconda\envs\ehr\Lib\multiprocessing\connection.py", line 353, in _get_more_data assert left > 0 ^^^^^^^^ AssertionError Traceback (most recent call last): File "E:\Anaconda\envs\ehr\Lib\multiprocessing\queues.py", line 266, in _feed send_bytes(obj) File "E:\Anaconda\envs\ehr\Lib\multiprocessing\connection.py", line 184, in send_bytes self._check_closed() File "E:\Anaconda\envs\ehr\Lib\multiprocessing\connection.py", line 137, in _check_closed raise OSError("handle is closed") OSError: handle is closed Traceback (most recent call last): File "E:\Anaconda\envs\ehr\Lib\multiprocessing\queues.py", line 266, in _feed send_bytes(obj) File "E:\Anaconda\envs\ehr\Lib\multiprocessing\connection.py", line 184, in send_bytes self._check_closed() File "E:\Anaconda\envs\ehr\Lib\multiprocessing\connection.py", line 137, in _check_closed raise OSError("handle is closed") OSError: handle is closed Traceback (most recent call last): File "E:\Anaconda\envs\ehr\Lib\multiprocessing\queues.py", line 266, in _feed send_bytes(obj) File "E:\Anaconda\envs\ehr\Lib\multiprocessing\connection.py", line 184, in send_bytes self._check_closed() File "E:\Anaconda\envs\ehr\Lib\multiprocessing\connection.py", line 137, in _check_closed raise OSError("handle is closed") OSError: handle is closed Traceback (most recent call last): File "E:\Anaconda\envs\ehr\Lib\multiprocessing\queues.py", line 266, in _feed send_bytes(obj) File "E:\Anaconda\envs\ehr\Lib\multiprocessing\connection.py", line 184, in send_bytes self._check_closed() File "E:\Anaconda\envs\ehr\Lib\multiprocessing\connection.py", line 137, in _check_closed raise OSError("handle is closed") OSError: handle is closed Traceback (most recent call last): File "E:\Anaconda\envs\ehr\Lib\multiprocessing\queues.py", line 266, in _feed send_bytes(obj) File "E:\Anaconda\envs\ehr\Lib\multiprocessing\connection.py", line 184, in send_bytes self._check_closed() File "E:\Anaconda\envs\ehr\Lib\multiprocessing\connection.py", line 137, in _check_closed raise OSError("handle is closed") OSError: handle is closed Traceback (most recent call last): File "E:\Anaconda\envs\ehr\Lib\multiprocessing\queues.py", line 266, in _feed send_bytes(obj) File "E:\Anaconda\envs\ehr\Lib\multiprocessing\connection.py", line 184, in send_bytes self._check_closed() File "E:\Anaconda\envs\ehr\Lib\multiprocessing\connection.py", line 137, in _check_closed raise OSError("handle is closed") OSError: handle is closed Traceback (most recent call last): File "E:\Anaconda\envs\ehr\Lib\multiprocessing\queues.py", line 266, in _feed send_bytes(obj) File "E:\Anaconda\envs\ehr\Lib\multiprocessing\connection.py", line 184, in send_bytes self._check_closed() File "E:\Anaconda\envs\ehr\Lib\multiprocessing\connection.py", line 137, in _check_closed raise OSError("handle is closed") OSError: handle is closed Traceback (most recent call last): File "C:\Users\wangchao\Desktop\medical_concept_representation-main\data\datasets\mimic-iv-2.2\process_mimic.py", line 139, in main() File "C:\Users\wangchao\Desktop\medical_concept_representation-main\data\datasets\mimic-iv-2.2\process_mimic.py", line 68, in main process_map( File "E:\Anaconda\envs\ehr\Lib\site-packages\tqdm\contrib\concurrent.py", line 105, in process_map return _executor_map(ProcessPoolExecutor, fn, *iterables, *tqdm_kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "E:\Anaconda\envs\ehr\Lib\site-packages\tqdm\contrib\concurrent.py", line 51, in _executor_map return list(tqdm_class(ex.map(fn, iterables, chunksize=chunksize), kwargs)) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "E:\Anaconda\envs\ehr\Lib\concurrent\futures\process.py", line 859, in map results = super().map(partial(_process_chunk, fn), ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "E:\Anaconda\envs\ehr\Lib\concurrent\futures_base.py", line 608, in map fs = [self.submit(fn, args) for args in zip(iterables)] ^^^^^^^^^^^^^^^^^^^^^^ File "E:\Anaconda\envs\ehr\Lib\concurrent\futures\process.py", line 813, in submit raise BrokenProcessPool(self._broken) concurrent.futures.process.BrokenProcessPool: A child process terminated abruptly, the process pool is not usable anymore"

Thanks Chao Wang

— Reply to this email directly, view it on GitHub https://github.com/ds4dh/medical_concept_representation/issues/1, or unsubscribe https://github.com/notifications/unsubscribe-auth/AE5O4MREKS65QR3QQGAPT2LZEGNHDAVCNFSM6AAAAABIJTOB5SVHI2DSMVQWIX3LMV43ASLTON2WKOZSGMYTONJYGM2DKOA . You are receiving this because you are subscribed to this thread.Message ID: @.***>

AIforGenomics commented 3 months ago

Hello Alban, Thanks for your reply. I am really appreciated your responsibility on this work. I'd like to return the latest status of the code. When I modified the codes as mentioned above, It works when I set DEBUG = True, although it spends more time, about 48~72h (I can accept this time). When I set DEBUG = False, even chunksize=1 and N_CPUS_USED=1, it was filed, the output messages are as following.

" PRO_MAP.icd10pcs[PRO_MAP.icd10pcs == 'NoPCS'] =\ Process SpawnPoolWorker-1: Traceback (most recent call last): File "E:\Anaconda\envs\ehr\Lib\multiprocessing\process.py", line 314, in _bootstrap self.run() File "E:\Anaconda\envs\ehr\Lib\multiprocessing\process.py", line 108, in run self._target(*self._args, **self._kwargs) File "E:\Anaconda\envs\ehr\Lib\multiprocessing\pool.py", line 114, in worker task = get() ^^^^^ File "E:\Anaconda\envs\ehr\Lib\multiprocessing\queues.py", line 387, in get res = self._reader.recv_bytes() ^^^^^^^^^^^^^^^^^^^^^^^^^ File "E:\Anaconda\envs\ehr\Lib\multiprocessing\connection.py", line 216, in recv_bytes buf = self._recv_bytes(maxlength) ^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "E:\Anaconda\envs\ehr\Lib\multiprocessing\connection.py", line 334, in _recv_bytes return self._get_more_data(ov, maxsize) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "E:\Anaconda\envs\ehr\Lib\multiprocessing\connection.py", line 353, in _get_more_data assert left > 0 ^^^^^^^^ AssertionError Building train set: 0%| | 0/239769 [00:11<?, ?it/s] Traceback (most recent call last): File "C:\Users\wangchao\Desktop\medical_concept_representation-main\data\datasets\mimic-iv-2.2\process_mimic.py", line 145, in main() File "C:\Users\wangchao\Desktop\medical_concept_representation-main\data\datasets\mimic-iv-2.2\process_mimic.py", line 76, in main list(tqdm( File "E:\Anaconda\envs\ehr\Lib\site-packages\tqdm\std.py", line 1181, in iter for obj in iterable: File "E:\Anaconda\envs\ehr\Lib\multiprocessing\pool.py", line 873, in next raise value File "E:\Anaconda\envs\ehr\Lib\multiprocessing\pool.py", line 540, in _handle_tasks put(task) File "E:\Anaconda\envs\ehr\Lib\multiprocessing\connection.py", line 206, in send self._send_bytes(_ForkingPickler.dumps(obj)) File "E:\Anaconda\envs\ehr\Lib\multiprocessing\connection.py", line 289, in _send_bytes ov, err = _winapi.WriteFile(self._handle, buf, overlapped=True) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ OSError: [WinError 87] 参数错误。 " I guess it might caused by the multi-processing steps.

Chao Wang

albornet commented 3 months ago

Hello again,

Maybe this is a windows-specific multiprocessing issue? I saw this https://stackoverflow.com/questions/47692566/python-multiprocessing-apply-async-assert-left-0-assertionerror post with a similar error, also occurring on windows. In that case, I can provide a docker image that you can run with docker-compose only for the data-building process. Can you try to pull my changes, and run data preprocessing with docker compose? docker-compose up --build

EDIT I added a few workarounds in data/datasets/mimic-iv-2.2/process_mimic.py (thread-safe lock + defining the function used in the multiprocessing loop before the main function) which might solve potential issues on windows, so maybe simply running the script with these new changes might work as well :)

If that also doesn't work, the easy workaround is to just avoid multiprocessing which takes more time but is not so bad because you only need to run the pre-processing script once and then you have the patient data. For this you just need to just need to set DEBUG to True and comment the line if DEBUG: OUTPUT_DIR += '_debug'

Let me know if any of this gives promising results.

Best, Alban

Le lun. 27 mai 2024 à 03:53, Chao Wang @.***> a écrit :

Hello Alban, Thanks for your reply. I am really appreciated your responsibility on this work. I'd like to return the latest status of the code. When I modified the codes as mentioned above, It works when I set DEBUG = True, although it spends more time, about 48~72h. When I set DEBUG = False, even chunksize=1 and N_CPUS_USED=1, it was filled, the output messages are as following.

" PRO_MAP.icd10pcs[PRO_MAP.icd10pcs == 'NoPCS'] = Process SpawnPoolWorker-1: Traceback (most recent call last): File "E:\Anaconda\envs\ehr\Lib\multiprocessing\process.py", line 314, in _bootstrap self.run() File "E:\Anaconda\envs\ehr\Lib\multiprocessing\process.py", line 108, in run self._target(*self._args, *self._kwargs) File "E:\Anaconda\envs\ehr\Lib\multiprocessing\pool.py", line 114, in worker task = get() ^^^^^ File "E:\Anaconda\envs\ehr\Lib\multiprocessing\queues.py", line 387, in get res = self._reader.recv_bytes() ^^^^^^^^^^^^^^^^^^^^^^^^^ File "E:\Anaconda\envs\ehr\Lib\multiprocessing\connection.py", line 216, in recv_bytes buf = self._recv_bytes(maxlength) ^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "E:\Anaconda\envs\ehr\Lib\multiprocessing\connection.py", line 334, in _recv_bytes return self._get_more_data(ov, maxsize) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "E:\Anaconda\envs\ehr\Lib\multiprocessing\connection.py", line 353, in _get_more_data assert left > 0 ^^^^^^^^ AssertionError Building train set: 0%| | 0/239769 [00:11<?, ?it/s] Traceback (most recent call last): File "C:\Users\wangchao\Desktop\medical_concept_representation-main\data\datasets\mimic-iv-2.2\process_mimic.py", line 145, in main() File "C:\Users\wangchao\Desktop\medical_concept_representation-main\data\datasets\mimic-iv-2.2\process_mimic.py", line 76, in main list(tqdm( File "E:\Anaconda\envs\ehr\Lib\site-packages\tqdm\std.py", line 1181, in iter* for obj in iterable: File "E:\Anaconda\envs\ehr\Lib\multiprocessing\pool.py", line 873, in next raise value File "E:\Anaconda\envs\ehr\Lib\multiprocessing\pool.py", line 540, in _handle_tasks put(task) File "E:\Anaconda\envs\ehr\Lib\multiprocessing\connection.py", line 206, in send self._send_bytes(_ForkingPickler.dumps(obj)) File "E:\Anaconda\envs\ehr\Lib\multiprocessing\connection.py", line 289, in _send_bytes ov, err = _winapi.WriteFile(self._handle, buf, overlapped=True) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ OSError: [WinError 87] 参数错误。 " Chao Wang

— Reply to this email directly, view it on GitHub https://github.com/ds4dh/medical_concept_representation/issues/1#issuecomment-2132509362, or unsubscribe https://github.com/notifications/unsubscribe-auth/AE5O4MRTQWN32CCEGUTHR6LZEKGYRAVCNFSM6AAAAABIJTOB5SVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCMZSGUYDSMZWGI . You are receiving this because you commented.Message ID: @.***>