analysiscenter / radio

RadIO is a library for data science research of computed tomography imaging
https://analysiscenter.github.io/radio/
Apache License 2.0
222 stars 52 forks source link

RadIO.IV 'Too many open files' #30

Closed henrique closed 4 years ago

henrique commented 5 years ago

Hi guys,

Thanks for sharing your amazing work here. I was just trying to run your notebook 4 radio/tutorials/RadIO.IV.ipynb and run on a weird error while parallelizing the split_dump on cell 9. I had a quick look on your code but can't see how to disable the multiprocessing in the pipeline. Any idea what could be causing this? I'm running it on a regular AWS p3 ubuntu DL instance.

Thanks for any input


RuntimeError Traceback (most recent call last)

in () ~/anaconda3/lib/python3.6/site-packages/radio/batchflow/batchflow/pipeline.py in run(self, init_vars, *args, **kwargs) 1279 warnings.warn('Pipeline will never stop as n_epochs=None') 1280 -> 1281 for _ in self.gen_batch(*args, **kwargs): 1282 pass 1283 return self ~/anaconda3/lib/python3.6/site-packages/radio/batchflow/batchflow/pipeline.py in _gen_batch(self, *args, **kwargs) 1212 for batch in batch_generator: 1213 try: -> 1214 batch_res = self.execute_for(batch) 1215 except SkipBatchException: 1216 pass ~/anaconda3/lib/python3.6/site-packages/radio/batchflow/batchflow/pipeline.py in execute_for(self, batch, new_loop) 607 asyncio.set_event_loop(asyncio.new_event_loop()) 608 batch.pipeline = self --> 609 batch_res = self._exec_all_actions(batch) 610 batch_res.pipeline = self 611 return batch_res ~/anaconda3/lib/python3.6/site-packages/radio/batchflow/batchflow/pipeline.py in _exec_all_actions(self, batch, action_list) 579 join_batches = None 580 --> 581 batch = self._exec_one_action(batch, _action, _action_args, _action['kwargs']) 582 583 batch.pipeline = self ~/anaconda3/lib/python3.6/site-packages/radio/batchflow/batchflow/pipeline.py in _exec_one_action(self, batch, action, args, kwargs) 527 batch.pipeline = self 528 action_method, _ = self._get_action_method(batch, action['name']) --> 529 batch = action_method(*args, **kwargs) 530 batch.pipeline = self 531 return batch ~/anaconda3/lib/python3.6/site-packages/radio/batchflow/batchflow/decorators.py in _action_wrapper(action_self, *args, **kwargs) 42 action_self.pipeline.get_variable(_lock_name).acquire() 43 ---> 44 _res = action_method(action_self, *args, **kwargs) 45 46 if _use_lock is not None: ~/anaconda3/lib/python3.6/site-packages/radio/batchflow/batchflow/decorators.py in wrapped_method(self, *args, **kwargs) 323 324 if asyncio.iscoroutinefunction(method) or _target in ['async', 'a']: --> 325 x = wrap_with_async(self, args, kwargs) 326 elif _target in ['threads', 't']: 327 x = wrap_with_threads(self, args, kwargs) ~/anaconda3/lib/python3.6/site-packages/radio/batchflow/batchflow/decorators.py in wrap_with_async(self, args, kwargs) 289 loop.run_until_complete(wait_for_all(futures, loop)) 290 --> 291 return _call_post_fn(self, post_fn, futures, args, full_kwargs) 292 293 def wrap_with_for(self, args, kwargs): ~/anaconda3/lib/python3.6/site-packages/radio/batchflow/batchflow/decorators.py in _call_post_fn(self, post_fn, futures, args, kwargs) 153 traceback.print_tb(all_errors[0].__traceback__) 154 return self --> 155 return post_fn(all_results, *args, **kwargs) 156 157 def _prepare_args(self, args, kwargs): ~/anaconda3/lib/python3.6/site-packages/radio/preprocessing/ct_batch.py in _post_default(self, list_of_arrs, update, new_batch, **kwargs) 918 Output of each worker should correspond to individual patient. 919 """ --> 920 self._reraise_worker_exceptions(list_of_arrs) 921 res = self 922 if update: ~/anaconda3/lib/python3.6/site-packages/radio/preprocessing/ct_batch.py in _reraise_worker_exceptions(self, worker_outputs) 865 if any_action_failed(worker_outputs): 866 all_errors = self.get_errors(worker_outputs) --> 867 raise RuntimeError("Failed parallelizing. Some of the workers failed with following errors: ", all_errors) 868 869 def _post_custom_components(self, list_of_dicts, **kwargs): RuntimeError: ('Failed parallelizing. Some of the workers failed with following errors: ', [OSError(24, 'Too many open files'), OSError(24, 'Too many open files'), OSError(24, 'Too many open files'), OSError(24, 'Too many open files'), OSError(24, 'Too many open files'), OSError(24, 'Too many open files'), OSError(24, 'Too many open files'), OSError(24, 'Too many open files'), OSError(24, 'Too many open files'), OSError(24, 'Too many open files'), OSError(24, 'Too many open files'), OSError(24, 'Too many open files'), OSError(24, 'Too many open files'), OSError(24, 'Too many open files'), OSError(24, 'Too many open files'), OSError(24, 'Too many open files'), OSError(24, 'Too many open files'), OSError(24, 'Too many open files'), OSError(24, 'Too many open files'), OSError(24, 'Too many open files'), OSError(24, 'Too many open files'), OSError(24, 'Too many open files'), OSError(24, 'Too many open files'), OSError(24, 'Too many open files'), OSError(24, 'Too many open files'), OSError(24, 'Too many open files'), OSError(24, 'Too many open files'), OSError(24, 'Too many open files'), OSError(24, 'Too many open files'), OSError(24, 'Too many open files'), OSError(24, 'Too many open files'), OSError(24, 'Too many open files'), OSError(24, 'Too many open files'), OSError(24, 'Too many open files'), OSError(24, 'Too many open files'), OSError(24, 'Too many open files'), OSError(24, 'Too many open files'), OSError(24, 'Too many open files'), OSError(24, 'Too many open files'), OSError(24, 'Too many open files'), OSError(24, 'Too many open files'), OSError(24, 'Too many open files'), OSError(24, 'Too many open files'), OSError(24, 'Too many open files'), OSError(24, 'Too many open files'), OSError(24, 'Too many open files'), OSError(24, 'Too many open files'), OSError(24, 'Too many open files'), OSError(24, 'Too many open files'), OSError(24, 'Too many open files'), OSError(24, 'Too many open files'), OSError(24, 'Too many open files'), OSError(24, 'Too many open files'), OSError(24, 'Too many open files'), OSError(24, 'Too many open files'), OSError(24, 'Too many open files'), OSError(24, 'Too many open files'), OSError(24, 'Too many open files')])
star-yar commented 5 years ago

Got the same error, seems RadIO is trying to do write batch opening too much files at parallel. Any ideas for quick fix?

AlexeyKozhevin commented 5 years ago

Please try to restart Jupiter Notebook to be sure that the error is reproducible

star-yar commented 5 years ago

Running the code

import numpy as np
import pandas as pd

import radio
from radio import batchflow as bf
from radio import CTImagesMaskedBatch as CTIMB
from radio.pipelines import split_dump

# read annotation
nodules = pd.read_csv('../data/annotations.csv')

# create index and dataset
lunaix = bf.FilesIndex(path='../data/subsets/s*/*.mhd', no_ext=True)
lunaset = bf.Dataset(index=lunaix, batch_class=CTIMB)

lunaset.split(0.9, shuffle=True)

print(len(lunaset.train), len(lunaset.test))

SPACING = (1.7, 1.0, 1.0)  # spacing of scans after spacing unification
SHAPE = (400, 512, 512)  # shape of scans after spacing unification
PADDING = 'reflect'  # 'reflect' padding-mode produces the least amount of artefacts
METHOD = 'pil-simd'  # robust resize-engine

kwargs_default = dict(shape=SHAPE, spacing=SPACING, padding=PADDING, method=METHOD)

crop_pipeline = split_dump(cancer_path='../data/lunaset_split/train/cancer/', 
                           non_cancer_path='../data/lunaset_split/train/ncancer/',
                           nodules=nodules, fmt='raw', nodule_shape=(32, 64, 64),
                           batch_size=20, **kwargs_default)

(lunaset.train >> crop_pipeline).run()

Thrown exception

798 89
---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-2-e1bfd42c8e9d> in <module>
     30                            batch_size=20, **kwargs_default)
     31 
---> 32 (lunaset.train >> crop_pipeline).run()

/usr/local/lib/python3.6/site-packages/radio/batchflow/batchflow/pipeline.py in run(self, init_vars, *args, **kwargs)
   1279                 warnings.warn('Pipeline will never stop as n_epochs=None')
   1280 
-> 1281             for _ in self.gen_batch(*args, **kwargs):
   1282                 pass
   1283         return self

/usr/local/lib/python3.6/site-packages/radio/batchflow/batchflow/pipeline.py in _gen_batch(self, *args, **kwargs)
   1212             for batch in batch_generator:
   1213                 try:
-> 1214                     batch_res = self.execute_for(batch)
   1215                 except SkipBatchException:
   1216                     pass

/usr/local/lib/python3.6/site-packages/radio/batchflow/batchflow/pipeline.py in execute_for(self, batch, new_loop)
    607             asyncio.set_event_loop(asyncio.new_event_loop())
    608         batch.pipeline = self
--> 609         batch_res = self._exec_all_actions(batch)
    610         batch_res.pipeline = self
    611         return batch_res

/usr/local/lib/python3.6/site-packages/radio/batchflow/batchflow/pipeline.py in _exec_all_actions(self, batch, action_list)
    579                     join_batches = None
    580 
--> 581                 batch = self._exec_one_action(batch, _action, _action_args, _action['kwargs'])
    582 
    583             batch.pipeline = self

/usr/local/lib/python3.6/site-packages/radio/batchflow/batchflow/pipeline.py in _exec_one_action(self, batch, action, args, kwargs)
    527                 batch.pipeline = self
    528                 action_method, _ = self._get_action_method(batch, action['name'])
--> 529                 batch = action_method(*args, **kwargs)
    530                 batch.pipeline = self
    531         return batch

/usr/local/lib/python3.6/site-packages/radio/batchflow/batchflow/decorators.py in _action_wrapper(action_self, *args, **kwargs)
     42                 action_self.pipeline.get_variable(_lock_name).acquire()
     43 
---> 44         _res = action_method(action_self, *args, **kwargs)
     45 
     46         if _use_lock is not None:

/usr/local/lib/python3.6/site-packages/radio/batchflow/batchflow/decorators.py in wrapped_method(self, *args, **kwargs)
    323 
    324             if asyncio.iscoroutinefunction(method) or _target in ['async', 'a']:
--> 325                 x = wrap_with_async(self, args, kwargs)
    326             elif _target in ['threads', 't']:
    327                 x = wrap_with_threads(self, args, kwargs)

/usr/local/lib/python3.6/site-packages/radio/batchflow/batchflow/decorators.py in wrap_with_async(self, args, kwargs)
    289                 loop.run_until_complete(wait_for_all(futures, loop))
    290 
--> 291             return _call_post_fn(self, post_fn, futures, args, full_kwargs)
    292 
    293         def wrap_with_for(self, args, kwargs):

/usr/local/lib/python3.6/site-packages/radio/batchflow/batchflow/decorators.py in _call_post_fn(self, post_fn, futures, args, kwargs)
    153                     traceback.print_tb(all_errors[0].__traceback__)
    154                 return self
--> 155             return post_fn(all_results, *args, **kwargs)
    156 
    157         def _prepare_args(self, args, kwargs):

/usr/local/lib/python3.6/site-packages/radio/preprocessing/ct_batch.py in _post_default(self, list_of_arrs, update, new_batch, **kwargs)
    918         Output of each worker should correspond to individual patient.
    919         """
--> 920         self._reraise_worker_exceptions(list_of_arrs)
    921         res = self
    922         if update:

/usr/local/lib/python3.6/site-packages/radio/preprocessing/ct_batch.py in _reraise_worker_exceptions(self, worker_outputs)
    865         if any_action_failed(worker_outputs):
    866             all_errors = self.get_errors(worker_outputs)
--> 867             raise RuntimeError("Failed parallelizing. Some of the workers failed with following errors: ", all_errors)
    868 
    869     def _post_custom_components(self, list_of_dicts, **kwargs):

RuntimeError: ('Failed parallelizing. Some of the workers failed with following errors: ', [OSError(24, 'Too many open files'), OSError(24, 'Too many open files'), OSError(24, 'Too many open files'), OSError(24, 'Too many open files'), OSError(24, 'Too many open files'), OSError(24, 'Too many open files'), OSError(24, 'Too many open files'), OSError(24, 'Too many open files'), OSError(24, 'Too many open files'), OSError(24, 'Too many open files'), OSError(24, 'Too many open files'), OSError(24, 'Too many open files'), OSError(24, 'Too many open files'), OSError(24, 'Too many open files'), OSError(24, 'Too many open files'), OSError(24, 'Too many open files'), OSError(24, 'Too many open files'), OSError(24, 'Too many open files'), OSError(24, 'Too many open files'), OSError(24, 'Too many open files'), OSError(24, 'Too many open files'), OSError(24, 'Too many open files'), OSError(24, 'Too many open files'), OSError(24, 'Too many open files'), OSError(24, 'Too many open files'), OSError(24, 'Too many open files'), OSError(24, 'Too many open files'), OSError(24, 'Too many open files'), OSError(24, 'Too many open files'), OSError(24, 'Too many open files'), OSError(24, 'Too many open files'), OSError(24, 'Too many open files'), OSError(24, 'Too many open files'), OSError(24, 'Too many open files'), OSError(24, 'Too many open files'), OSError(24, 'Too many open files'), OSError(24, 'Too many open files'), OSError(24, 'Too many open files'), OSError(24, 'Too many open files'), OSError(24, 'Too many open files'), OSError(24, 'Too many open files'), OSError(24, 'Too many open files'), OSError(24, 'Too many open files'), OSError(24, 'Too many open files'), OSError(24, 'Too many open files'), OSError(24, 'Too many open files'), OSError(24, 'Too many open files'), OSError(24, 'Too many open files'), OSError(24, 'Too many open files'), OSError(24, 'Too many open files'), OSError(24, 'Too many open files'), OSError(24, 'Too many open files'), OSError(24, 'Too many open files'), OSError(24, 'Too many open files'), OSError(24, 'Too many open files'), OSError(24, 'Too many open files'), OSError(24, 'Too many open files'), OSError(24, 'Too many open files'), OSError(24, 'Too many open files'), OSError(24, 'Too many open files'), OSError(24, 'Too many open files'), OSError(24, 'Too many open files'), OSError(24, 'Too many open files'), OSError(24, 'Too many open files'), OSError(24, 'Too many open files'), OSError(24, 'Too many open files'), OSError(24, 'Too many open files'), OSError(24, 'Too many open files')])
Isaver23 commented 5 years ago

Same error, restarted jupyter and error again... tried to reduce number of files to handle at once but failed.

Isaver23 commented 5 years ago

Solved this problem by:

  1. stop jupyter notebook service
  2. run this bash command ulimit -n 180000
  3. restart jupyter notebook

Hope this could help someone in the future :) ...

Isaver23 commented 5 years ago

By the way, how long it would take to dump 3d patch after running (lunaset.train >> crop_pipeline).run()?

Or, is there anyway that I could know the rate of progress or control the patch number to generate?

star-yar commented 5 years ago

Also found a way to reset the limits for open files with:

import resource
soft_, hard_ = resource.getrlimit(resource.RLIMIT_NOFILE)
resource.setrlimit(resource.RLIMIT_NOFILE, (hard_*0.8, hard_))

But whole issue is about not closing async jobs properly, couldn't find a proper solution yet. As a result you may have quite strange memory leak. Снимок экрана 2019-07-22 в 16 37 28

star-yar commented 5 years ago

By the way, how long it would take to dump 3d patch after running (lunaset.train >> crop_pipeline).run()?

Or, is there anyway that I could know the rate of progress or control the patch number to generate?

There is a way as doc shows https://analysiscenter.github.io/batchflow/intro/pipeline.html?highlight=bar but i found out it's not working properly.

roman-kh commented 5 years ago

@star-yar What's wrong with bar? We use it all the time without issues.

ListIndexOutOfRange commented 5 years ago

Hi guys ! I got the same error and i can't get rid of it. I'm running a JupyterLab on a remote computer. I tried 'ulimit -n 180000' and 'import resource soft, hard = resource.getrlimit(resource.RLIMIT_NOFILE) resource.setrlimit(resource.RLIMITNOFILE, (hard*0.8, hard_))' and but I still got the same error.

Does anyone have a neat trick ?

NuthanakantiBhaskar commented 4 years ago

Im using spyder in anaconda for following code but getting error:

from radio import CTImagesBatch from dataset import FilesIndex, Dataset

dicom_ix = FilesIndex(path='LIDC-IDRI-0001/*', no_ext=True) # set up the index dicom_dataset = Dataset(index=dicom_ix, batch_class=CTImagesBatch) # init the dataset of dicom files

showing error: from radio import CTImagesBatch

ImportError: cannot import name 'CTImagesBatch' from 'radio' (D:\mywork\RADIO\radio.py).

So please suggest

roman-kh commented 4 years ago

Your file is named radio.py which conflicts with the framework package name (radio). So rename your file (radio.py) and the directory (RADIO).

NuthanakantiBhaskar commented 4 years ago

Dear sir thanks for the reply after chaining as per your suggestion im getting one more following error: from dataset import FilesIndex, Dataset

ModuleNotFoundError: No module named 'dataset'

So please suggest the solution.

roman-kh commented 4 years ago

Obviously, you don't have dataset module. Please follow the tutorial or installation procedure.