MIC-DKFZ / nnUNet

Apache License 2.0
5.9k stars 1.76k forks source link

RuntimeError: One or more background workers are no longer alive. Exiting. Please check the print statements above for the actual error message #2166

Closed li-dailin closed 4 months ago

li-dailin commented 6 months ago

Hi, I'm having some trouble running nnUNet on the Synapse BCV dataset. I run it on a anaconda powershell prompt, and this error occurs the same for both 2D and 3D training. The version I am currently using is 2.2.1, I tried the latest version too and it reported the very same error. I noticed that there are multiple different processes and AttributeErrors, which are a bit different from what I saw in previous issues, and that's why I raised a separate one. The outputs are as follows:

2024-05-08 06:03:36.244731: unpacking dataset...
2024-05-08 06:04:31.883632: unpacking done...
2024-05-08 06:04:31.889076: do_dummy_2d_data_aug: False
2024-05-08 06:04:31.893605: Creating new 5-fold cross-validation split...
2024-05-08 06:04:31.899590: Desired fold for training: 0
2024-05-08 06:04:31.905573: This split has 24 training and 6 validation cases.
D:\conda\lib\site-packages\torch\onnx\symbolic_helper.py:1466: UserWarning: ONNX export mode is set to TrainingMode.EVAL, but operator 'instance_norm' is set to train=True. Exporting with train=True.
  warnings.warn(
2024-05-08 06:04:38.226775: Unable to plot network architecture:
2024-05-08 06:04:38.230765: failed to execute WindowsPath('dot'), make sure the Graphviz executables are on your systems' PATH
2024-05-08 06:04:38.259687:
2024-05-08 06:04:38.264674: Epoch 0
2024-05-08 06:04:38.269661: Current learning rate: 0.01
using pin_memory on device 0
Process Process-5:
Traceback (most recent call last):
  File "D:\conda\lib\multiprocessing\process.py", line 315, in _bootstrap
    self.run()
  File "D:\conda\lib\multiprocessing\process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "C:\Users\user\AppData\Roaming\Python\Python39\site-packages\batchgenerators\dataloading\nondet_multi_threaded_augmenter.py", line 41, in producer
    with threadpool_limits(1, None):
  File "D:\conda\lib\site-packages\threadpoolctl.py", line 171, in __init__
    self._original_info = self._set_threadpool_limits()
  File "D:\conda\lib\site-packages\threadpoolctl.py", line 268, in _set_threadpool_limits
    modules = _ThreadpoolInfo(prefixes=self._prefixes,
  File "D:\conda\lib\site-packages\threadpoolctl.py", line 340, in __init__
    self._load_modules()
  File "D:\conda\lib\site-packages\threadpoolctl.py", line 373, in _load_modules
    self._find_modules_with_enum_process_module_ex()
  File "D:\conda\lib\site-packages\threadpoolctl.py", line 485, in _find_modules_with_enum_process_module_ex
    self._make_module_from_path(filepath)
  File "D:\conda\lib\site-packages\threadpoolctl.py", line 515, in _make_module_from_path
    module = module_class(filepath, prefix, user_api, internal_api)
  File "D:\conda\lib\site-packages\threadpoolctl.py", line 606, in __init__
    self.version = self.get_version()
  File "D:\conda\lib\site-packages\threadpoolctl.py", line 646, in get_version
    config = get_config().split()
AttributeError: 'NoneType' object has no attribute 'split'
Process Process-6:
Traceback (most recent call last):
  File "D:\conda\lib\multiprocessing\process.py", line 315, in _bootstrap
    self.run()
  File "D:\conda\lib\multiprocessing\process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "C:\Users\user\AppData\Roaming\Python\Python39\site-packages\batchgenerators\dataloading\nondet_multi_threaded_augmenter.py", line 41, in producer
    with threadpool_limits(1, None):
  File "D:\conda\lib\site-packages\threadpoolctl.py", line 171, in __init__
    self._original_info = self._set_threadpool_limits()
  File "D:\conda\lib\site-packages\threadpoolctl.py", line 268, in _set_threadpool_limits
    modules = _ThreadpoolInfo(prefixes=self._prefixes,
  File "D:\conda\lib\site-packages\threadpoolctl.py", line 340, in __init__
    self._load_modules()
  File "D:\conda\lib\site-packages\threadpoolctl.py", line 373, in _load_modules
    self._find_modules_with_enum_process_module_ex()
  File "D:\conda\lib\site-packages\threadpoolctl.py", line 485, in _find_modules_with_enum_process_module_ex
    self._make_module_from_path(filepath)
  File "D:\conda\lib\site-packages\threadpoolctl.py", line 515, in _make_module_from_path
    module = module_class(filepath, prefix, user_api, internal_api)
  File "D:\conda\lib\site-packages\threadpoolctl.py", line 606, in __init__
    self.version = self.get_version()
  File "D:\conda\lib\site-packages\threadpoolctl.py", line 646, in get_version
    config = get_config().split()
AttributeError: 'NoneType' object has no attribute 'split'
Process Process-7:
Traceback (most recent call last):
  File "D:\conda\lib\multiprocessing\process.py", line 315, in _bootstrap
    self.run()
  File "D:\conda\lib\multiprocessing\process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "C:\Users\user\AppData\Roaming\Python\Python39\site-packages\batchgenerators\dataloading\nondet_multi_threaded_augmenter.py", line 41, in producer
    with threadpool_limits(1, None):
  File "D:\conda\lib\site-packages\threadpoolctl.py", line 171, in __init__
    self._original_info = self._set_threadpool_limits()
  File "D:\conda\lib\site-packages\threadpoolctl.py", line 268, in _set_threadpool_limits
    modules = _ThreadpoolInfo(prefixes=self._prefixes,
  File "D:\conda\lib\site-packages\threadpoolctl.py", line 340, in __init__
    self._load_modules()
  File "D:\conda\lib\site-packages\threadpoolctl.py", line 373, in _load_modules
    self._find_modules_with_enum_process_module_ex()
  File "D:\conda\lib\site-packages\threadpoolctl.py", line 485, in _find_modules_with_enum_process_module_ex
    self._make_module_from_path(filepath)
  File "D:\conda\lib\site-packages\threadpoolctl.py", line 515, in _make_module_from_path
    module = module_class(filepath, prefix, user_api, internal_api)
  File "D:\conda\lib\site-packages\threadpoolctl.py", line 606, in __init__
    self.version = self.get_version()
  File "D:\conda\lib\site-packages\threadpoolctl.py", line 646, in get_version
    config = get_config().split()
AttributeError: 'NoneType' object has no attribute 'split'
Process Process-8:
Traceback (most recent call last):
  File "D:\conda\lib\multiprocessing\process.py", line 315, in _bootstrap
    self.run()
  File "D:\conda\lib\multiprocessing\process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "C:\Users\user\AppData\Roaming\Python\Python39\site-packages\batchgenerators\dataloading\nondet_multi_threaded_augmenter.py", line 41, in producer
    with threadpool_limits(1, None):
  File "D:\conda\lib\site-packages\threadpoolctl.py", line 171, in __init__
    self._original_info = self._set_threadpool_limits()
  File "D:\conda\lib\site-packages\threadpoolctl.py", line 268, in _set_threadpool_limits
    modules = _ThreadpoolInfo(prefixes=self._prefixes,
  File "D:\conda\lib\site-packages\threadpoolctl.py", line 340, in __init__
    self._load_modules()
  File "D:\conda\lib\site-packages\threadpoolctl.py", line 373, in _load_modules
    self._find_modules_with_enum_process_module_ex()
  File "D:\conda\lib\site-packages\threadpoolctl.py", line 485, in _find_modules_with_enum_process_module_ex
    self._make_module_from_path(filepath)
  File "D:\conda\lib\site-packages\threadpoolctl.py", line 515, in _make_module_from_path
    module = module_class(filepath, prefix, user_api, internal_api)
  File "D:\conda\lib\site-packages\threadpoolctl.py", line 606, in __init__
    self.version = self.get_version()
  File "D:\conda\lib\site-packages\threadpoolctl.py", line 646, in get_version
    config = get_config().split()
AttributeError: 'NoneType' object has no attribute 'split'
Process Process-9:
Exception in thread Traceback (most recent call last):
Thread-4  File "D:\conda\lib\multiprocessing\process.py", line 315, in _bootstrap
    self.run()
  File "D:\conda\lib\multiprocessing\process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
:
  File "C:\Users\user\AppData\Roaming\Python\Python39\site-packages\batchgenerators\dataloading\nondet_multi_threaded_augmenter.py", line 41, in producer
    with threadpool_limits(1, None):
Traceback (most recent call last):
  File "D:\conda\lib\site-packages\threadpoolctl.py", line 171, in __init__
    self._original_info = self._set_threadpool_limits()
  File "D:\conda\lib\site-packages\threadpoolctl.py", line 268, in _set_threadpool_limits
    modules = _ThreadpoolInfo(prefixes=self._prefixes,
  File "D:\conda\lib\site-packages\threadpoolctl.py", line 340, in __init__
    self._load_modules()
  File "D:\conda\lib\site-packages\threadpoolctl.py", line 373, in _load_modules
    self._find_modules_with_enum_process_module_ex()
  File "D:\conda\lib\threading.py", line 980, in _bootstrap_inner
  File "D:\conda\lib\site-packages\threadpoolctl.py", line 485, in _find_modules_with_enum_process_module_ex
    self._make_module_from_path(filepath)
  File "D:\conda\lib\site-packages\threadpoolctl.py", line 515, in _make_module_from_path
    module = module_class(filepath, prefix, user_api, internal_api)
  File "D:\conda\lib\site-packages\threadpoolctl.py", line 606, in __init__
    self.version = self.get_version()
      File "D:\conda\lib\site-packages\threadpoolctl.py", line 646, in get_version
    config = get_config().split()
self.run()
  File "D:\conda\lib\threading.py", line 917, in run
AttributeError: 'NoneType' object has no attribute 'split'
    self._target(*self._args, **self._kwargs)
  File "C:\Users\user\AppData\Roaming\Python\Python39\site-packages\batchgenerators\dataloading\nondet_multi_threaded_augmenter.py", line 125, in results_loop
    raise e
  File "C:\Users\user\AppData\Roaming\Python\Python39\site-packages\batchgenerators\dataloading\nondet_multi_threaded_augmenter.py", line 103, in results_loop
    raise RuntimeError("One or more background workers are no longer alive. Exiting. Please check the "
RuntimeError: One or more background workers are no longer alive. Exiting. Please check the print statements above for the actual error message
Traceback (most recent call last):
  File "D:\conda\lib\runpy.py", line 197, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "D:\conda\lib\runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "C:\Users\user\AppData\Roaming\Python\Python39\Scripts\nnUNetv2_train.exe\__main__.py", line 7, in <module>
  File "f:\ldl\nnunet\nnunetv2\run\run_training.py", line 268, in run_training_entry
    run_training(args.dataset_name_or_id, args.configuration, args.fold, args.tr, args.p, args.pretrained_weights,
  File "f:\ldl\nnunet\nnunetv2\run\run_training.py", line 204, in run_training
    nnunet_trainer.run_training()
  File "f:\ldl\nnunet\nnunetv2\training\nnUNetTrainer\nnUNetTrainer.py", line 1279, in run_training
    train_outputs.append(self.train_step(next(self.dataloader_train)))
  File "C:\Users\user\AppData\Roaming\Python\Python39\site-packages\batchgenerators\dataloading\nondet_multi_threaded_augmenter.py", line 196, in __next__
    item = self.__get_next_item()
  File "C:\Users\user\AppData\Roaming\Python\Python39\site-packages\batchgenerators\dataloading\nondet_multi_threaded_augmenter.py", line 181, in __get_next_item
    raise RuntimeError("One or more background workers are no longer alive. Exiting. Please check the "
RuntimeError: One or more background workers are no longer alive. Exiting. Please check the print statements above for the actual error message

I would appreciate it if you help :-)

Karol-G commented 6 months ago

Hey,

This seems to be the relevant error message: AttributeError: 'NoneType' object has no attribute 'split'

Your split file seems to be corrupted. Try deleting it and then running the training again. It will generate a new one automatically.

Best regards, Karol