Closed henrique closed 4 years ago
Got the same error, seems RadIO is trying to do write batch opening too much files at parallel. Any ideas for quick fix?
Please try to restart Jupiter Notebook to be sure that the error is reproducible
Running the code
import numpy as np
import pandas as pd
import radio
from radio import batchflow as bf
from radio import CTImagesMaskedBatch as CTIMB
from radio.pipelines import split_dump
# read annotation
nodules = pd.read_csv('../data/annotations.csv')
# create index and dataset
lunaix = bf.FilesIndex(path='../data/subsets/s*/*.mhd', no_ext=True)
lunaset = bf.Dataset(index=lunaix, batch_class=CTIMB)
lunaset.split(0.9, shuffle=True)
print(len(lunaset.train), len(lunaset.test))
SPACING = (1.7, 1.0, 1.0) # spacing of scans after spacing unification
SHAPE = (400, 512, 512) # shape of scans after spacing unification
PADDING = 'reflect' # 'reflect' padding-mode produces the least amount of artefacts
METHOD = 'pil-simd' # robust resize-engine
kwargs_default = dict(shape=SHAPE, spacing=SPACING, padding=PADDING, method=METHOD)
crop_pipeline = split_dump(cancer_path='../data/lunaset_split/train/cancer/',
non_cancer_path='../data/lunaset_split/train/ncancer/',
nodules=nodules, fmt='raw', nodule_shape=(32, 64, 64),
batch_size=20, **kwargs_default)
(lunaset.train >> crop_pipeline).run()
Thrown exception
798 89
---------------------------------------------------------------------------
RuntimeError Traceback (most recent call last)
<ipython-input-2-e1bfd42c8e9d> in <module>
30 batch_size=20, **kwargs_default)
31
---> 32 (lunaset.train >> crop_pipeline).run()
/usr/local/lib/python3.6/site-packages/radio/batchflow/batchflow/pipeline.py in run(self, init_vars, *args, **kwargs)
1279 warnings.warn('Pipeline will never stop as n_epochs=None')
1280
-> 1281 for _ in self.gen_batch(*args, **kwargs):
1282 pass
1283 return self
/usr/local/lib/python3.6/site-packages/radio/batchflow/batchflow/pipeline.py in _gen_batch(self, *args, **kwargs)
1212 for batch in batch_generator:
1213 try:
-> 1214 batch_res = self.execute_for(batch)
1215 except SkipBatchException:
1216 pass
/usr/local/lib/python3.6/site-packages/radio/batchflow/batchflow/pipeline.py in execute_for(self, batch, new_loop)
607 asyncio.set_event_loop(asyncio.new_event_loop())
608 batch.pipeline = self
--> 609 batch_res = self._exec_all_actions(batch)
610 batch_res.pipeline = self
611 return batch_res
/usr/local/lib/python3.6/site-packages/radio/batchflow/batchflow/pipeline.py in _exec_all_actions(self, batch, action_list)
579 join_batches = None
580
--> 581 batch = self._exec_one_action(batch, _action, _action_args, _action['kwargs'])
582
583 batch.pipeline = self
/usr/local/lib/python3.6/site-packages/radio/batchflow/batchflow/pipeline.py in _exec_one_action(self, batch, action, args, kwargs)
527 batch.pipeline = self
528 action_method, _ = self._get_action_method(batch, action['name'])
--> 529 batch = action_method(*args, **kwargs)
530 batch.pipeline = self
531 return batch
/usr/local/lib/python3.6/site-packages/radio/batchflow/batchflow/decorators.py in _action_wrapper(action_self, *args, **kwargs)
42 action_self.pipeline.get_variable(_lock_name).acquire()
43
---> 44 _res = action_method(action_self, *args, **kwargs)
45
46 if _use_lock is not None:
/usr/local/lib/python3.6/site-packages/radio/batchflow/batchflow/decorators.py in wrapped_method(self, *args, **kwargs)
323
324 if asyncio.iscoroutinefunction(method) or _target in ['async', 'a']:
--> 325 x = wrap_with_async(self, args, kwargs)
326 elif _target in ['threads', 't']:
327 x = wrap_with_threads(self, args, kwargs)
/usr/local/lib/python3.6/site-packages/radio/batchflow/batchflow/decorators.py in wrap_with_async(self, args, kwargs)
289 loop.run_until_complete(wait_for_all(futures, loop))
290
--> 291 return _call_post_fn(self, post_fn, futures, args, full_kwargs)
292
293 def wrap_with_for(self, args, kwargs):
/usr/local/lib/python3.6/site-packages/radio/batchflow/batchflow/decorators.py in _call_post_fn(self, post_fn, futures, args, kwargs)
153 traceback.print_tb(all_errors[0].__traceback__)
154 return self
--> 155 return post_fn(all_results, *args, **kwargs)
156
157 def _prepare_args(self, args, kwargs):
/usr/local/lib/python3.6/site-packages/radio/preprocessing/ct_batch.py in _post_default(self, list_of_arrs, update, new_batch, **kwargs)
918 Output of each worker should correspond to individual patient.
919 """
--> 920 self._reraise_worker_exceptions(list_of_arrs)
921 res = self
922 if update:
/usr/local/lib/python3.6/site-packages/radio/preprocessing/ct_batch.py in _reraise_worker_exceptions(self, worker_outputs)
865 if any_action_failed(worker_outputs):
866 all_errors = self.get_errors(worker_outputs)
--> 867 raise RuntimeError("Failed parallelizing. Some of the workers failed with following errors: ", all_errors)
868
869 def _post_custom_components(self, list_of_dicts, **kwargs):
RuntimeError: ('Failed parallelizing. Some of the workers failed with following errors: ', [OSError(24, 'Too many open files'), OSError(24, 'Too many open files'), OSError(24, 'Too many open files'), OSError(24, 'Too many open files'), OSError(24, 'Too many open files'), OSError(24, 'Too many open files'), OSError(24, 'Too many open files'), OSError(24, 'Too many open files'), OSError(24, 'Too many open files'), OSError(24, 'Too many open files'), OSError(24, 'Too many open files'), OSError(24, 'Too many open files'), OSError(24, 'Too many open files'), OSError(24, 'Too many open files'), OSError(24, 'Too many open files'), OSError(24, 'Too many open files'), OSError(24, 'Too many open files'), OSError(24, 'Too many open files'), OSError(24, 'Too many open files'), OSError(24, 'Too many open files'), OSError(24, 'Too many open files'), OSError(24, 'Too many open files'), OSError(24, 'Too many open files'), OSError(24, 'Too many open files'), OSError(24, 'Too many open files'), OSError(24, 'Too many open files'), OSError(24, 'Too many open files'), OSError(24, 'Too many open files'), OSError(24, 'Too many open files'), OSError(24, 'Too many open files'), OSError(24, 'Too many open files'), OSError(24, 'Too many open files'), OSError(24, 'Too many open files'), OSError(24, 'Too many open files'), OSError(24, 'Too many open files'), OSError(24, 'Too many open files'), OSError(24, 'Too many open files'), OSError(24, 'Too many open files'), OSError(24, 'Too many open files'), OSError(24, 'Too many open files'), OSError(24, 'Too many open files'), OSError(24, 'Too many open files'), OSError(24, 'Too many open files'), OSError(24, 'Too many open files'), OSError(24, 'Too many open files'), OSError(24, 'Too many open files'), OSError(24, 'Too many open files'), OSError(24, 'Too many open files'), OSError(24, 'Too many open files'), OSError(24, 'Too many open files'), OSError(24, 'Too many open files'), OSError(24, 'Too many open files'), OSError(24, 'Too many open files'), OSError(24, 'Too many open files'), OSError(24, 'Too many open files'), OSError(24, 'Too many open files'), OSError(24, 'Too many open files'), OSError(24, 'Too many open files'), OSError(24, 'Too many open files'), OSError(24, 'Too many open files'), OSError(24, 'Too many open files'), OSError(24, 'Too many open files'), OSError(24, 'Too many open files'), OSError(24, 'Too many open files'), OSError(24, 'Too many open files'), OSError(24, 'Too many open files'), OSError(24, 'Too many open files'), OSError(24, 'Too many open files')])
Same error, restarted jupyter and error again... tried to reduce number of files to handle at once but failed.
Solved this problem by:
ulimit -n 180000
Hope this could help someone in the future :) ...
By the way, how long it would take to dump 3d patch after running (lunaset.train >> crop_pipeline).run()
?
Or, is there anyway that I could know the rate of progress or control the patch number to generate?
Also found a way to reset the limits for open files with:
import resource
soft_, hard_ = resource.getrlimit(resource.RLIMIT_NOFILE)
resource.setrlimit(resource.RLIMIT_NOFILE, (hard_*0.8, hard_))
But whole issue is about not closing async jobs properly, couldn't find a proper solution yet. As a result you may have quite strange memory leak.
By the way, how long it would take to dump 3d patch after running
(lunaset.train >> crop_pipeline).run()
?Or, is there anyway that I could know the rate of progress or control the patch number to generate?
There is a way as doc shows https://analysiscenter.github.io/batchflow/intro/pipeline.html?highlight=bar but i found out it's not working properly.
@star-yar What's wrong with bar
? We use it all the time without issues.
Hi guys ! I got the same error and i can't get rid of it. I'm running a JupyterLab on a remote computer. I tried 'ulimit -n 180000' and 'import resource soft, hard = resource.getrlimit(resource.RLIMIT_NOFILE) resource.setrlimit(resource.RLIMITNOFILE, (hard*0.8, hard_))' and but I still got the same error.
Does anyone have a neat trick ?
Im using spyder in anaconda for following code but getting error:
from radio import CTImagesBatch from dataset import FilesIndex, Dataset
dicom_ix = FilesIndex(path='LIDC-IDRI-0001/*', no_ext=True) # set up the index dicom_dataset = Dataset(index=dicom_ix, batch_class=CTImagesBatch) # init the dataset of dicom files
showing error: from radio import CTImagesBatch
ImportError: cannot import name 'CTImagesBatch' from 'radio' (D:\mywork\RADIO\radio.py).
So please suggest
Your file is named radio.py
which conflicts with the framework package name (radio
).
So rename your file (radio.py
) and the directory (RADIO
).
Dear sir thanks for the reply after chaining as per your suggestion im getting one more following error: from dataset import FilesIndex, Dataset
ModuleNotFoundError: No module named 'dataset'
So please suggest the solution.
Obviously, you don't have dataset
module.
Please follow the tutorial or installation procedure.
Hi guys,
Thanks for sharing your amazing work here. I was just trying to run your notebook 4 radio/tutorials/RadIO.IV.ipynb and run on a weird error while parallelizing the split_dump on cell 9. I had a quick look on your code but can't see how to disable the multiprocessing in the pipeline. Any idea what could be causing this? I'm running it on a regular AWS p3 ubuntu DL instance.
Thanks for any input