Closed CoderJackZhu closed 1 year ago
Hi there, interesting stuff. I have never had that problem. Let's see. Google tells me that this error message mostly appears if thread limits are exceeded but that does not appear to be your problem.
Can you please try the following:
OMP_NUM_THREADS=1 nnUNetv2_train 137 3d_fullres 0 --npz
Can you please also confirm that this appears on other hardware (if possible)? If you are running this on a compute cluster, it might make sense to even try an entirely different setup (such as a local workstation) to make sure it's not some configuration problem of the operating system
The error appears in an external library that nnU-Net (or rather batchgenerators) is using. Maybe it would help to also open an issue there and ask for advice: https://github.com/joblib/threadpoolctl
Thank you very much. After following your instruction OMP_NUM_THREADS=1 nnUNetv2_train 137 3d_fullres 0 --npz
, I succeed solve this problem. I have not tried to run this code on another computer.
OK thanks for the feedback. It appears that we still need OMP_NUM_THREADS . I was hoping we could ignore that 🙈 On our systems at least it works without. Too bad...
Can I ask you to test something for me? Doesn't take long
I'm willing to help you. But my machine seems to have something wrong and crashed; it may take a minute tomorrow or later. The crash may not be caused by this program, I will let you know when the machine is back to normal
Is there any test I need to do, the machine seems to be normal
Can you please
import os
os.environ['OMP_NUM_THREADS']=1
Thanks!
(/data/ailab/2022/ZYJ/nnunet) [stu0301@gpu03 nnUNet]$ nnUNetv2_train 137 3d_fullres 3 --npz
Traceback (most recent call last):
File "/data/ailab/2022/ZYJ/nnunet/bin/nnUNetv2_train", line 33, in
My bad, the code I sent you is wrong. It should be:
import os
os.environ['OMP_NUM_THREADS']="1"
The code works correctly
Fantastic :-) Thanks! I will need to wait a bit to see if more people have the same problem and if so then I have to reintroduce OMP_NUM_THREADS (I dropped this when moving from v1 top v2 to make the installation simpler)
我遇到同样的错误,使用os.environ['OMP_NUM_THREADS']="1"不行,有可能是线程开太多了,nvidia-smi看不出来,使用 fuser -v /dev/nvidia* |awk '{for(i=1;i<=NF;i++)print "kill -9 " $i;}' | sh 把所有线程杀死,就可以运行了
?
I am still getting the same error even after running with OMP_NUM_THREADS=1 My hardware configuration: Mac pro early 2015 mac os 10.14 ram: 16gb gpu: Intel Iris Graphics 6100 1536 MB
(base) Akshays-MacBook-Pro:~ akshay$ OMP_NUM_THREADS=1 nnUNetv2_train 1 2d 0 -device cpu /opt/anaconda3/lib/python3.9/site-packages/scipy/init.py:146: UserWarning: A NumPy version >=1.16.5 and <1.23.0 is required for this version of SciPy (detected version 1.25.0 warnings.warn(f"A NumPy version >={np_minversion} and <{np_maxversion}" Using device: cpu
####################################################################### Please cite the following paper when using nnU-Net: Isensee, F., Jaeger, P. F., Kohl, S. A., Petersen, J., & Maier-Hein, K. H. (2021). nnU-Net: a self-configuring method for deep learning-based biomedical image segmentation. Nature methods, 18(2), 203-211. #######################################################################
This is the configuration used by this training: Configuration name: 2d {'data_identifier': 'nnUNetPlans_2d', 'preprocessor_name': 'DefaultPreprocessor', 'batch_size': 13, 'patch_size': [768, 320], 'median_image_size_in_voxels': [3180.0, 1498.5], 'spacing': [1.0, 1.0], 'normalization_schemes': ['ZScoreNormalization'], 'use_mask_for_norm': [True], 'UNet_class_name': 'PlainConvUNet', 'UNet_base_num_features': 32, 'n_conv_per_stage_encoder': [2, 2, 2, 2, 2, 2, 2, 2], 'n_conv_per_stage_decoder': [2, 2, 2, 2, 2, 2, 2], 'num_pool_per_axis': [7, 6], 'pool_op_kernel_sizes': [[1, 1], [2, 2], [2, 2], [2, 2], [2, 2], [2, 2], [2, 2], [2, 1]], 'conv_kernel_sizes': [[3, 3], [3, 3], [3, 3], [3, 3], [3, 3], [3, 3], [3, 3], [3, 3]], 'unet_max_num_features': 512, 'resampling_fn_data': 'resample_data_or_seg_to_shape', 'resampling_fn_seg': 'resample_data_or_seg_to_shape', 'resampling_fn_data_kwargs': {'is_seg': False, 'order': 3, 'order_z': 0, 'force_separate_z': None}, 'resampling_fn_seg_kwargs': {'is_seg': True, 'order': 1, 'order_z': 0, 'force_separate_z': None}, 'resampling_fn_probabilities': 'resample_data_or_seg_to_shape', 'resampling_fn_probabilities_kwargs': {'is_seg': False, 'order': 1, 'order_z': 0, 'force_separate_z': None}, 'batch_dice': True}
These are the global plan.json settings: {'dataset_name': 'Dataset001_InBreast', 'plans_name': 'nnUNetPlans', 'original_median_spacing_after_transp': [999.0, 1.0, 1.0], 'original_median_shape_after_transp': [1, 3180, 1498], 'image_reader_writer': 'NaturalImage2DIO', 'transpose_forward': [0, 1, 2], 'transpose_backward': [0, 1, 2], 'experiment_planner_used': 'ExperimentPlanner', 'label_manager': 'LabelManager', 'foreground_intensity_properties_per_channel': {'0': {'max': 255.0, 'mean': 171.5933231709939, 'median': 176.0, 'min': 69.0, 'percentile_00_5': 101.0, 'percentile_99_5': 230.0, 'std': 30.106854637129793}}}
2023-06-19 20:05:54.976702: unpacking dataset...
/opt/anaconda3/lib/python3.9/site-packages/scipy/init.py:146: UserWarning: A NumPy version >=1.16.5 and <1.23.0 is required for this version of SciPy (detected version 1.25.0
warnings.warn(f"A NumPy version >={np_minversion} and <{np_maxversion}"
/opt/anaconda3/lib/python3.9/site-packages/scipy/init.py:146: UserWarning: A NumPy version >=1.16.5 and <1.23.0 is required for this version of SciPy (detected version 1.25.0
warnings.warn(f"A NumPy version >={np_minversion} and <{np_maxversion}"
2023-06-19 20:06:02.614134: unpacking done...
2023-06-19 20:06:02.616596: do_dummy_2d_data_aug: False
2023-06-19 20:06:02.618724: Using splits from existing split file: /Users/akshay/Documents/Master Project/Breast segmenatation Unet/nnUNet_preprocessed/Dataset001_InBreast/splits_final.json
2023-06-19 20:06:02.619395: The split file contains 5 splits.
2023-06-19 20:06:02.619575: Desired fold for training: 0
2023-06-19 20:06:02.619777: This split has 68 training and 18 validation cases.
2023-06-19 20:06:02.789835: Unable to plot network architecture:
2023-06-19 20:06:02.790218: No module named 'hiddenlayer'
2023-06-19 20:06:02.818363:
2023-06-19 20:06:02.818888: Epoch 0
2023-06-19 20:06:02.819498: Current learning rate: 0.01
/opt/anaconda3/lib/python3.9/site-packages/scipy/init.py:146: UserWarning: A NumPy version >=1.16.5 and <1.23.0 is required for this version of SciPy (detected version 1.25.0
warnings.warn(f"A NumPy version >={np_minversion} and <{np_maxversion}"
/opt/anaconda3/lib/python3.9/site-packages/scipy/init.py:146: UserWarning: A NumPy version >=1.16.5 and <1.23.0 is required for this version of SciPy (detected version 1.25.0
warnings.warn(f"A NumPy version >={np_minversion} and <{np_maxversion}"
/opt/anaconda3/lib/python3.9/site-packages/scipy/init.py:146: UserWarning: A NumPy version >=1.16.5 and <1.23.0 is required for this version of SciPy (detected version 1.25.0
warnings.warn(f"A NumPy version >={np_minversion} and <{np_maxversion}"
/opt/anaconda3/lib/python3.9/site-packages/scipy/init.py:146: UserWarning: A NumPy version >=1.16.5 and <1.23.0 is required for this version of SciPy (detected version 1.25.0
warnings.warn(f"A NumPy version >={np_minversion} and <{np_maxversion}"
Process Process-6:
Process Process-4:
Process Process-3:
Traceback (most recent call last):
File "/opt/anaconda3/lib/python3.9/multiprocessing/process.py", line 315, in _bootstrap
self.run()
File "/opt/anaconda3/lib/python3.9/multiprocessing/process.py", line 108, in run
self._target(*self._args, self._kwargs)
File "/opt/anaconda3/lib/python3.9/site-packages/batchgenerators/dataloading/nondet_multi_threaded_augmenter.py", line 41, in producer
with threadpool_limits(1, None):
File "/opt/anaconda3/lib/python3.9/site-packages/threadpoolctl.py", line 171, in init
self._original_info = self._set_threadpool_limits()
File "/opt/anaconda3/lib/python3.9/site-packages/threadpoolctl.py", line 268, in _set_threadpool_limits
modules = _ThreadpoolInfo(prefixes=self._prefixes,
File "/opt/anaconda3/lib/python3.9/site-packages/threadpoolctl.py", line 340, in init
self._load_modules()
File "/opt/anaconda3/lib/python3.9/site-packages/threadpoolctl.py", line 371, in _load_modules
self._find_modules_with_dyld()
File "/opt/anaconda3/lib/python3.9/site-packages/threadpoolctl.py", line 428, in _find_modules_with_dyld
self._make_module_from_path(filepath)
File "/opt/anaconda3/lib/python3.9/site-packages/threadpoolctl.py", line 515, in _make_module_from_path
module = module_class(filepath, prefix, user_api, internal_api)
File "/opt/anaconda3/lib/python3.9/site-packages/threadpoolctl.py", line 606, in init
self.version = self.get_version()
File "/opt/anaconda3/lib/python3.9/site-packages/threadpoolctl.py", line 646, in get_version
config = get_config().split()
AttributeError: 'NoneType' object has no attribute 'split'
Traceback (most recent call last):
File "/opt/anaconda3/lib/python3.9/multiprocessing/process.py", line 315, in _bootstrap
self.run()
File "/opt/anaconda3/lib/python3.9/multiprocessing/process.py", line 108, in run
self._target(*self._args, *self._kwargs)
File "/opt/anaconda3/lib/python3.9/site-packages/batchgenerators/dataloading/nondet_multi_threaded_augmenter.py", line 41, in producer
with threadpool_limits(1, None):
File "/opt/anaconda3/lib/python3.9/site-packages/threadpoolctl.py", line 171, in init
self._original_info = self._set_threadpool_limits()
File "/opt/anaconda3/lib/python3.9/site-packages/threadpoolctl.py", line 268, in _set_threadpool_limits
modules = _ThreadpoolInfo(prefixes=self._prefixes,
File "/opt/anaconda3/lib/python3.9/site-packages/threadpoolctl.py", line 340, in init
self._load_modules()
File "/opt/anaconda3/lib/python3.9/site-packages/threadpoolctl.py", line 371, in _load_modules
self._find_modules_with_dyld()
File "/opt/anaconda3/lib/python3.9/site-paTraceback (most recent call last):
File "/opt/anaconda3/lib/python3.9/multiprocessing/process.py", line 315, in _bootstrap
self.run()
File "/opt/anaconda3/lib/python3.9/multiprocessing/process.py", line 108, in run
self._target(self._args, self._kwargs)
File "/opt/anaconda3/lib/python3.9/site-packages/batchgenerators/dataloading/nondet_multi_threaded_augmenter.py", line 41, in producer
with threadpool_limits(1, None):
File "/opt/anaconda3/lib/python3.9/site-packages/threadpoolctl.py", line 171, in init
self._original_info = self._set_threadpool_limits()
File "/opt/anaconda3/lib/python3.9/site-packages/threadpoolctl.py", line 268, in _set_threadpool_limits
modules = _ThreadpoolInfo(prefixes=self._prefixes,
File "/opt/anaconda3/lib/python3.9/site-packages/threadpoolctl.py", line 340, in init
self._load_modules()
File "/opt/anaconda3/lib/python3.9/site-packages/threadpoolctl.py", line 371, in _load_modules
self._find_modules_with_dyld()
File "/opt/anaconda3/lib/python3.9/site-packages/threadpoolctl.py", line 428, in _find_modules_with_dyld
self._make_module_from_path(filepath)
File "/opt/anaconda3/lib/python3.9/site-packages/threadpoolctl.py", line 515, in _make_module_from_path
module = module_class(filepath, prefix, user_api, internal_api)
File "/opt/anaconda3/lib/python3.9/site-packages/threadpoolctl.py", line 606, in init
self.version = self.get_version()
File "/opt/anaconda3/lib/python3.9/site-packages/threadpoolctl.py", line 646, in get_version
config = get_config().split()
AttributeError: 'NoneType' object has no attribute 'split'
ckages/threadpoolctl.py", line 428, in _find_modules_with_dyld
self._make_module_from_path(filepath)
File "/opt/anaconda3/lib/python3.9/site-packages/threadpoolctl.py", line 515, in _make_module_from_path
module = module_class(filepath, prefix, user_api, internal_api)
File "/opt/anaconda3/lib/python3.9/site-packages/threadpoolctl.py", line 606, in init
self.version = self.get_version()
File "/opt/anaconda3/lib/python3.9/site-packages/threadpoolctl.py", line 646, in get_version
config = get_config().split()
AttributeError: 'NoneType' object has no attribute 'split'
Process Process-5:
Traceback (most recent call last):
File "/opt/anaconda3/lib/python3.9/multiprocessing/process.py", line 315, in _bootstrap
self.run()
File "/opt/anaconda3/lib/python3.9/multiprocessing/process.py", line 108, in run
self._target(*self._args, *self._kwargs)
File "/opt/anaconda3/lib/python3.9/site-packages/batchgenerators/dataloading/nondet_multi_threaded_augmenter.py", line 41, in producer
with threadpool_limits(1, None):
File "/opt/anaconda3/lib/python3.9/site-packages/threadpoolctl.py", line 171, in init
self._original_info = self._set_threadpool_limits()
File "/opt/anaconda3/lib/python3.9/site-packages/threadpoolctl.py", line 268, in _set_threadpool_limits
modules = _ThreadpoolInfo(prefixes=self._prefixes,
File "/opt/anaconda3/lib/python3.9/site-packages/threadpoolctl.py", line 340, in init
self._load_modules()
File "/opt/anaconda3/lib/python3.9/site-packages/threadpoolctl.py", line 371, in _load_modules
self._find_modules_with_dyld()
File "/opt/anaconda3/lib/python3.9/site-packages/threadpoolctl.py", line 428, in _find_modules_with_dyld
self._make_module_from_path(filepath)
File "/opt/anaconda3/lib/python3.9/site-packages/threadpoolctl.py", line 515, in _make_module_from_path
module = module_class(filepath, prefix, user_api, internal_api)
File "/opt/anaconda3/lib/python3.9/site-packages/threadpoolctl.py", line 606, in init
self.version = self.get_version()
File "/opt/anaconda3/lib/python3.9/site-packages/threadpoolctl.py", line 646, in get_version
config = get_config().split()
AttributeError: 'NoneType' object has no attribute 'split'
Exception in thread Thread-4:
Traceback (most recent call last):
File "/opt/anaconda3/lib/python3.9/threading.py", line 973, in _bootstrap_inner
self.run()
File "/opt/anaconda3/lib/python3.9/threading.py", line 910, in run
self._target(self._args, **self._kwargs)
File "/opt/anaconda3/lib/python3.9/site-packages/batchgenerators/dataloading/nondet_multi_threaded_augmenter.py", line 125, in results_loop
raise e
File "/opt/anaconda3/lib/python3.9/site-packages/batchgenerators/dataloading/nondet_multi_threaded_augmenter.py", line 103, in results_loop
raise RuntimeError("One or more background workers are no longer alive. Exiting. Please check the "
RuntimeError: One or more background workers are no longer alive. Exiting. Please check the print statements above for the actual error message
Traceback (most recent call last):
File "/opt/anaconda3/bin/nnUNetv2_train", line 8, in
for single threaded nnUNet, use
nnUNet_n_proc_DA=0 nnUNetv2_train 1 2d 0 -device cpu
hi I also get the following error:
Traceback (most recent call last):
File "/usr/local/bin/nnUNetv2_train", line 33, in
I tried setting the following environment variable: os.environ['OMP_NUM_THREADS']= "1" os.environ['nnunet_proc_DA']= "6"
None of them seem working. It's strange because I made others trainings before and everything went alright. Anyone has some suggestion?
Hello,
Same problem as @giuliarubiu above. Did other trainings previously with the exact same code and it was totally fine. Now all of a sudden getting this error, even with setting the thread number to 1 (OMP_NUM_THREADS=1)
I am also having the same error while running it in an Azure compute instance (12 vCPU cores, 220GB RAM and 1x Nvidia V100 16GB). There are some epochs of training and then it freezes with zero GPU activity. I tried OMP_NUM_THREADS=1 but it still does not work... further suggestions?
Hello, the same problem as @giuliarubiu, I tried OMP_NUM_THREADS=1 and also nnUNet_n_proc_DA=0 but it didn't work. I've tried it in a rtx8000 48GB. Can you please help me to solve this problem?
i have also tried both of those suggestions without working either :(
@FabianIsensee
Hi FabianIsensee,
I use nnUnetV2, when I add--c
after the training statement to continue training, the above problem occurs:
RuntimeError: One or more background workers are no longer alive. Exiting. Please check the print statements above for the actual error message
According to the above methods, none of them have been solved. Is there any other way to solve this problem?
I found that if I don't add --c
, use CUDA_VISIBLE_DEVICES=7 nnUNetv2_train 2 3d_fullres 4 --npz
training can be trained normally, but when I run 1000 epochs and I modify the number of epochs to 2000, use-- c
continue to train CUDA_VISIBLE_DEVICES=7 nnUNetv2_train 2 3d_fullres 4 --npz --c
,The problem will occur:RuntimeError: One or more background workers are no longer alive. Exiting. Please check the print statements above for the actual error message
我也遇到了类似的问题,然后定位到问题为dockers内的挂载硬盘和存储模型训练数据的硬盘不是同一块,所以当我把他俩统一在一个硬盘上时,这个问题就解决了。
遇到这个同类型报错,还可以检查一下nnUNet_proprecess文件夹内的预处理数据是否正确【我一般是通过人工看文件大小,要是出现文件大小只有几百kb的或者是0的文件时,就可能说明预处理数据没有跑完全】
Thank you for your great project. The problem is still there when doing inference on a large number of datasets. I solved it by adding --c and running the code again. please let me know in case of an update. Thank you.
Facing the same problem
Hey, is there a solution to this? I am receiving the same error.
Same here! I have the same problem
I still have the same issue from time to time when training a model, even hen using the latest version of nnUNet. I usually run only few epochs at the time, either 50 or 100 at a time since the frameworks saves a checkpoint each 50 epochs and you can continue the training from such.
Just change in the code self.num_epochs = 1000
in the script https://github.com/MIC-DKFZ/nnUNet/blob/master/nnunetv2/training/nnUNetTrainer/nnUNetTrainer.py to for example self.num_epochs = 100
,
You can then run the code for 100 epochs. Then you change the code to self.num_epochs = 200
and you continue the training with the --c
and will keep using the developed model thus far.
A more elegant way is to have: self.num_epochs = int(input())
Hope this helps.
Best
If you are running nnUNet in Docker, you may need to set the --shm-size parameter. This is because the default shm-size is 64M, which may not be enough. e.g., nvidia-docker run -it --name xxx --shm-size 32g image_id bash.
我遇到同样的错误,使用os.environ['OMP_NUM_THREADS']=“1”不行,有可能是线程开太多了,nvidia-smi看不出来,使用 fuser -v /dev/nvidia* |awk '{for(i=1;i<=NF;i++)打印 “kill -9 ” $i;}' |sh 把所有线程杀死,就可以运行了
我也遇到这个问题但是不能解决,请问能详细一点吗
Hi all, may I ask that in case of Window, how we can define the OMP_NUM_THREADS=1?
Since after I run this:
OMP_NUM_THREADS=1 CUDA_VISIBLE_DEVICES=0 nnUNetv2_train Dataset701_AbdomenCT 2d all -tr nnUNetTrainerUMambaBot -device cuda
It appears that: 'OMP_NUM_THREADS' is not recognized as an internal or external command, operable program or batch file.
我遇到同样的错误,使用os.environ['OMP_NUM_THREADS']=“1”不行,有可能是线程开太多了,nvidia-smi看不出来,使用 fuser -v /dev/nvidia* |awk '{for(i=1;i<=NF;i++)打印 “kill -9 ” $i;}' |sh 把所有线程杀死,就可以运行了
我也遇到这个问题但是不能解决,请问能详细一点吗
解决了吗,我也遇到了同样的问题
When I train this model, I always get this error information. I don't know why,I wander how to solve this problem. Thank you. (/data/ailab/2022/ZYJ/nnunet) [stu0301@gpu03 nnUNet]$ nnUNetv2_train 137 3d_fullres 0 --npz Using device: cuda:0
####################################################################### Please cite the following paper when using nnU-Net: Isensee, F., Jaeger, P. F., Kohl, S. A., Petersen, J., & Maier-Hein, K. H. (2021). nnU-Net: a self-configuring method for deep learning-based biomedical image segmentation. Nature methods, 18(2), 203-211. #######################################################################
This is the configuration used by this training: Configuration name: 3d_fullres {'data_identifier': 'nnUNetPlans_3d_fullres', 'preprocessor_name': 'DefaultPreprocessor', 'batch_size': 2, 'patch_size': [128, 128, 128], 'median_image_size_in_voxels': [140.0, 171.0, 137.0], 'spacing': [1.0, 1.0, 1.0], 'normalization_schemes': ['ZScoreNormalization', 'ZScoreNormalization', 'ZScoreNormalization', 'ZScoreNormalization'], 'use_mask_for_norm': [True, True, True, True], 'UNet_class_name': 'PlainConvUNet', 'UNet_base_num_features': 32, 'n_conv_per_stage_encoder': [2, 2, 2, 2, 2, 2], 'n_conv_per_stage_decoder': [2, 2, 2, 2, 2], 'num_pool_per_axis': [5, 5, 5], 'pool_op_kernel_sizes': [[1, 1, 1], [2, 2, 2], [2, 2, 2], [2, 2, 2], [2, 2, 2], [2, 2, 2]], 'conv_kernel_sizes': [[3, 3, 3], [3, 3, 3], [3, 3, 3], [3, 3, 3], [3, 3, 3], [3, 3, 3]], 'unet_max_num_features': 320, 'resampling_fn_data': 'resample_data_or_seg_to_shape', 'resampling_fn_seg': 'resample_data_or_seg_to_shape', 'resampling_fn_data_kwargs': {'is_seg': False, 'order': 3, 'order_z': 0, 'force_separate_z': None}, 'resampling_fn_seg_kwargs': {'is_seg': True, 'order': 1, 'order_z': 0, 'force_separate_z': None}, 'resampling_fn_probabilities': 'resample_data_or_seg_to_shape', 'resampling_fn_probabilities_kwargs': {'is_seg': False, 'order': 1, 'order_z': 0, 'force_separate_z': None}, 'batch_dice': False}
These are the global plan.json settings: {'dataset_name': 'Dataset137_BraTS2021', 'plans_name': 'nnUNetPlans', 'original_median_spacing_after_transp': [1.0, 1.0, 1.0], 'original_median_shape_after_transp': [140, 171, 137], 'image_reader_writer': 'SimpleITKIO', 'transpose_forward': [0, 1, 2], 'transpose_backward': [0, 1, 2], 'experiment_planner_used': 'ExperimentPlanner', 'label_manager': 'LabelManager', 'foreground_intensity_properties_per_channel': {'0': {'max': 95242.25, 'mean': 871.816650390625, 'median': 407.0, 'min': 0.10992202162742615, 'percentile_00_5': 55.0, 'percentile_99_5': 5825.0, 'std': 2023.5313720703125}, '1': {'max': 1905559.25, 'mean': 1698.2144775390625, 'median': 552.0, 'min': 0.0, 'percentile_00_5': 47.0, 'percentile_99_5': 8322.0, 'std': 18787.4140625}, '2': {'max': 4438107.0, 'mean': 2141.349365234375, 'median': 738.0, 'min': 0.0, 'percentile_00_5': 110.0, 'percentile_99_5': 10396.0, 'std': 45159.37890625}, '3': {'max': 580014.3125, 'mean': 995.436279296875, 'median': 512.3143920898438, 'min': 0.0, 'percentile_00_5': 108.0, 'percentile_99_5': 11925.0, 'std': 4629.87939453125}}}
2023-03-23 22:53:42.012139: unpacking dataset... 2023-03-23 22:57:15.189137: unpacking done... 2023-03-23 22:57:15.190897: do_dummy_2d_data_aug: False 2023-03-23 22:57:15.205438: Using splits from existing split file: /data/ailab/2022/ZYJ/Dataset/nnUNet_preprocessed/Dataset137_BraTS2021/splits_final.json 2023-03-23 22:57:15.207055: The split file contains 5 splits. 2023-03-23 22:57:15.207163: Desired fold for training: 0 2023-03-23 22:57:15.207259: This split has 1000 training and 251 validation cases. 2023-03-23 22:57:15.395872: Unable to plot network architecture: 2023-03-23 22:57:15.396091: No module named 'hiddenlayer' 2023-03-23 22:57:23.167642: 2023-03-23 22:57:23.167933: Epoch 0 2023-03-23 22:57:23.168380: Current learning rate: 0.01 using pin_memory on device 0 OpenBLAS blas_thread_init: pthread_create failed for thread 20 of 64: Resource temporarily unavailable OpenBLAS blas_thread_init: RLIMIT_NPROC 4096 current, 513046 max OpenBLAS blas_thread_init: pthread_create failed for thread 21 of 64: Resource temporarily unavailable OpenBLAS blas_thread_init: RLIMIT_NPROC 4096 current, 513046 max OpenBLAS blas_thread_init: pthread_create failed for thread 22 of 64: Resource temporarily unavailable OpenBLAS blas_thread_init: RLIMIT_NPROC 4096 current, 513046 max OpenBLAS blas_thread_init: pthread_create failed for thread 23 of 64: Resource temporarily unavailable OpenBLAS blas_thread_init: RLIMIT_NPROC 4096 current, 513046 max OpenBLAS blas_thread_init: pthread_create failed for thread 24 of 64: Resource temporarily unavailable OpenBLAS blas_thread_init: RLIMIT_NPROC 4096 current, 513046 max OpenBLAS blas_thread_init: pthread_create failed for thread 25 of 64: Resource temporarily unavailable OpenBLAS blas_thread_init: RLIMIT_NPROC 4096 current, 513046 max OpenBLAS blas_thread_init: pthread_create failed for thread 26 of 64: Resource temporarily unavailable OpenBLAS blas_thread_init: RLIMIT_NPROC 4096 current, 513046 max OpenBLAS blas_thread_init: pthread_create failed for thread 27 of 64: Resource temporarily unavailable OpenBLAS blas_thread_init: RLIMIT_NPROC 4096 current, 513046 max OpenBLAS blas_thread_init: pthread_create failed for thread 28 of 64: Resource temporarily unavailable OpenBLAS blas_thread_init: RLIMIT_NPROC 4096 current, 513046 max OpenBLAS blas_thread_init: pthread_create failed for thread 29 of 64: Resource temporarily unavailable OpenBLAS blas_thread_init: RLIMIT_NPROC 4096 current, 513046 max OpenBLAS blas_thread_init: pthread_create failed for thread 30 of 64: Resource temporarily unavailable OpenBLAS blas_thread_init: RLIMIT_NPROC 4096 current, 513046 max OpenBLAS blas_thread_init: pthread_create failed for thread 31 of 64: Resource temporarily unavailable OpenBLAS blas_thread_init: RLIMIT_NPROC 4096 current, 513046 max OpenBLAS blas_thread_init: pthread_create failed for thread 32 of 64: Resource temporarily unavailable OpenBLAS blas_thread_init: RLIMIT_NPROC 4096 current, 513046 max OpenBLAS blas_thread_init: pthread_create failed for thread 33 of 64: Resource temporarily unavailable OpenBLAS blas_thread_init: RLIMIT_NPROC 4096 current, 513046 max OpenBLAS blas_thread_init: pthread_create failed for thread 34 of 64: Resource temporarily unavailable OpenBLAS blas_thread_init: RLIMIT_NPROC 4096 current, 513046 max OpenBLAS blas_thread_init: pthread_create failed for thread 35 of 64: Resource temporarily unavailable OpenBLAS blas_thread_init: RLIMIT_NPROC 4096 current, 513046 max OpenBLAS blas_thread_init: pthread_create failed for thread 36 of 64: Resource temporarily unavailable OpenBLAS blas_thread_init: RLIMIT_NPROC 4096 current, 513046 max OpenBLAS blas_thread_init: pthread_create failed for thread 37 of 64: Resource temporarily unavailable OpenBLAS blas_thread_init: RLIMIT_NPROC 4096 current, 513046 max OpenBLAS blas_thread_init: pthread_create failed for thread 38 of 64: Resource temporarily unavailable OpenBLAS blas_thread_init: RLIMIT_NPROC 4096 current, 513046 max OpenBLAS blas_thread_init: pthread_create failed for thread 39 of 64: Resource temporarily unavailable OpenBLAS blas_thread_init: RLIMIT_NPROC 4096 current, 513046 max OpenBLAS blas_thread_init: pthread_create failed for thread 40 of 64: Resource temporarily unavailable OpenBLAS blas_thread_init: RLIMIT_NPROC 4096 current, 513046 max OpenBLAS blas_thread_init: pthread_create failed for thread 41 of 64: Resource temporarily unavailable OpenBLAS blas_thread_init: RLIMIT_NPROC 4096 current, 513046 max OpenBLAS blas_thread_init: pthread_create failed for thread 42 of 64: Resource temporarily unavailable OpenBLAS blas_thread_init: RLIMIT_NPROC 4096 current, 513046 max OpenBLAS blas_thread_init: pthread_create failed for thread 43 of 64: Resource temporarily unavailable OpenBLAS blas_thread_init: RLIMIT_NPROC 4096 current, 513046 max OpenBLAS blas_thread_init: pthread_create failed for thread 44 of 64: Resource temporarily unavailable OpenBLAS blas_thread_init: RLIMIT_NPROC 4096 current, 513046 max OpenBLAS blas_thread_init: pthread_create failed for thread 45 of 64: Resource temporarily unavailable OpenBLAS blas_thread_init: RLIMIT_NPROC 4096 current, 513046 max OpenBLAS blas_thread_init: pthread_create failed for thread 46 of 64: Resource temporarily unavailable OpenBLAS blas_thread_init: RLIMIT_NPROC 4096 current, 513046 max OpenBLAS blas_thread_init: pthread_create failed for thread 47 of 64: Resource temporarily unavailable OpenBLAS blas_thread_init: RLIMIT_NPROC 4096 current, 513046 max OpenBLAS blas_thread_init: pthread_create failed for thread 48 of 64: Resource temporarily unavailable OpenBLAS blas_thread_init: RLIMIT_NPROC 4096 current, 513046 max OpenBLAS blas_thread_init: pthread_create failed for thread 49 of 64: Resource temporarily unavailable OpenBLAS blas_thread_init: RLIMIT_NPROC 4096 current, 513046 max OpenBLAS blas_thread_init: pthread_create failed for thread 50 of 64: Resource temporarily unavailable OpenBLAS blas_thread_init: RLIMIT_NPROC 4096 current, 513046 max OpenBLAS blas_thread_init: pthread_create failed for thread 51 of 64: Resource temporarily unavailable OpenBLAS blas_thread_init: RLIMIT_NPROC 4096 current, 513046 max OpenBLAS blas_thread_init: pthread_create failed for thread 52 of 64: Resource temporarily unavailable OpenBLAS blas_thread_init: RLIMIT_NPROC 4096 current, 513046 max OpenBLAS blas_thread_init: pthread_create failed for thread 53 of 64: Resource temporarily unavailable OpenBLAS blas_thread_init: RLIMIT_NPROC 4096 current, 513046 max OpenBLAS blas_thread_init: pthread_create failed for thread 54 of 64: Resource temporarily unavailable OpenBLAS blas_thread_init: RLIMIT_NPROC 4096 current, 513046 max OpenBLAS blas_thread_init: pthread_create failed for thread 55 of 64: Resource temporarily unavailable OpenBLAS blas_thread_init: RLIMIT_NPROC 4096 current, 513046 max OpenBLAS blas_thread_init: pthread_create failed for thread 56 of 64: Resource temporarily unavailable OpenBLAS blas_thread_init: RLIMIT_NPROC 4096 current, 513046 max OpenBLAS blas_thread_init: pthread_create failed for thread 57 of 64: Resource temporarily unavailable OpenBLAS blas_thread_init: RLIMIT_NPROC 4096 current, 513046 max OpenBLAS blas_thread_init: pthread_create failed for thread 58 of 64: Resource temporarily unavailable OpenBLAS blas_thread_init: RLIMIT_NPROC 4096 current, 513046 max OpenBLAS blas_thread_init: pthread_create failed for thread 59 of 64: Resource temporarily unavailable OpenBLAS blas_thread_init: RLIMIT_NPROC 4096 current, 513046 max OpenBLAS blas_thread_init: pthread_create failed for thread 60 of 64: Resource temporarily unavailable OpenBLAS blas_thread_init: RLIMIT_NPROC 4096 current, 513046 max OpenBLAS blas_thread_init: pthread_create failed for thread 61 of 64: Resource temporarily unavailable OpenBLAS blas_thread_init: RLIMIT_NPROC 4096 current, 513046 max OpenBLAS blas_thread_init: pthread_create failed for thread 62 of 64: Resource temporarily unavailable OpenBLAS blas_thread_init: RLIMIT_NPROC 4096 current, 513046 max OpenBLAS blas_thread_init: pthread_create failed for thread 63 of 64: Resource temporarily unavailable OpenBLAS blas_thread_init: RLIMIT_NPROC 4096 current, 513046 max Process Process-21: Traceback (most recent call last): File "/data/ailab/2022/ZYJ/nnunet/lib/python3.9/multiprocessing/process.py", line 315, in _bootstrap self.run() File "/data/ailab/2022/ZYJ/nnunet/lib/python3.9/multiprocessing/process.py", line 108, in run self._target(*self._args, **self._kwargs) File "/data/ailab/2022/ZYJ/nnunet/lib/python3.9/site-packages/batchgenerators/dataloading/nondet_multi_threaded_augmenter.py", line 41, in producer with threadpool_limits(1, None): File "/data/ailab/2022/ZYJ/nnunet/lib/python3.9/site-packages/threadpoolctl.py", line 373, in init super().init(ThreadpoolController(), limits=limits, user_api=user_api) File "/data/ailab/2022/ZYJ/nnunet/lib/python3.9/site-packages/threadpoolctl.py", line 166, in init self._set_threadpool_limits() File "/data/ailab/2022/ZYJ/nnunet/lib/python3.9/site-packages/threadpoolctl.py", line 299, in _set_threadpool_limits lib_controller.set_num_threads(num_threads) File "/data/ailab/2022/ZYJ/nnunet/lib/python3.9/site-packages/threadpoolctl.py", line 865, in set_num_threads return set_func(num_threads) KeyboardInterrupt using pin_memory on device 0 OpenBLAS blas_thread_init: pthread_create failed for thread 19 of 64: Resource temporarily unavailable OpenBLAS blas_thread_init: RLIMIT_NPROC 4096 current, 513046 max OpenBLAS blas_thread_init: pthread_create failed for thread 20 of 64: Resource temporarily unavailable OpenBLAS blas_thread_init: RLIMIT_NPROC 4096 current, 513046 max OpenBLAS blas_thread_init: pthread_create failed for thread 21 of 64: Resource temporarily unavailable OpenBLAS blas_thread_init: RLIMIT_NPROC 4096 current, 513046 max OpenBLAS blas_thread_init: pthread_create failed for thread 22 of 64: Resource temporarily unavailable OpenBLAS blas_thread_init: RLIMIT_NPROC 4096 current, 513046 max OpenBLAS blas_thread_init: pthread_create failed for thread 23 of 64: Resource temporarily unavailable OpenBLAS blas_thread_init: RLIMIT_NPROC 4096 current, 513046 max OpenBLAS blas_thread_init: pthread_create failed for thread 24 of 64: Resource temporarily unavailable OpenBLAS blas_thread_init: RLIMIT_NPROC 4096 current, 513046 max OpenBLAS blas_thread_init: pthread_create failed for thread 25 of 64: Resource temporarily unavailable OpenBLAS blas_thread_init: RLIMIT_NPROC 4096 current, 513046 max OpenBLAS blas_thread_init: pthread_create failed for thread 26 of 64: Resource temporarily unavailable OpenBLAS blas_thread_init: RLIMIT_NPROC 4096 current, 513046 max OpenBLAS blas_thread_init: pthread_create failed for thread 27 of 64: Resource temporarily unavailable OpenBLAS blas_thread_init: RLIMIT_NPROC 4096 current, 513046 max OpenBLAS blas_thread_init: pthread_create failed for thread 28 of 64: Resource temporarily unavailable OpenBLAS blas_thread_init: RLIMIT_NPROC 4096 current, 513046 max OpenBLAS blas_thread_init: pthread_create failed for thread 29 of 64: Resource temporarily unavailable OpenBLAS blas_thread_init: RLIMIT_NPROC 4096 current, 513046 max OpenBLAS blas_thread_init: pthread_create failed for thread 30 of 64: Resource temporarily unavailable OpenBLAS blas_thread_init: RLIMIT_NPROC 4096 current, 513046 max OpenBLAS blas_thread_init: pthread_create failed for thread 31 of 64: Resource temporarily unavailable OpenBLAS blas_thread_init: RLIMIT_NPROC 4096 current, 513046 max OpenBLAS blas_thread_init: pthread_create failed for thread 32 of 64: Resource temporarily unavailable OpenBLAS blas_thread_init: RLIMIT_NPROC 4096 current, 513046 max OpenBLAS blas_thread_init: pthread_create failed for thread 33 of 64: Resource temporarily unavailable OpenBLAS blas_thread_init: RLIMIT_NPROC 4096 current, 513046 max OpenBLAS blas_thread_init: pthread_create failed for thread 34 of 64: Resource temporarily unavailable OpenBLAS blas_thread_init: RLIMIT_NPROC 4096 current, 513046 max OpenBLAS blas_thread_init: pthread_create failed for thread 35 of 64: Resource temporarily unavailable OpenBLAS blas_thread_init: RLIMIT_NPROC 4096 current, 513046 max OpenBLAS blas_thread_init: pthread_create failed for thread 36 of 64: Resource temporarily unavailable OpenBLAS blas_thread_init: RLIMIT_NPROC 4096 current, 513046 max OpenBLAS blas_thread_init: pthread_create failed for thread 37 of 64: Resource temporarily unavailable OpenBLAS blas_thread_init: RLIMIT_NPROC 4096 current, 513046 max OpenBLAS blas_thread_init: pthread_create failed for thread 38 of 64: Resource temporarily unavailable OpenBLAS blas_thread_init: RLIMIT_NPROC 4096 current, 513046 max OpenBLAS blas_thread_init: pthread_create failed for thread 39 of 64: Resource temporarily unavailable OpenBLAS blas_thread_init: RLIMIT_NPROC 4096 current, 513046 max OpenBLAS blas_thread_init: pthread_create failed for thread 40 of 64: Resource temporarily unavailable OpenBLAS blas_thread_init: RLIMIT_NPROC 4096 current, 513046 max OpenBLAS blas_thread_init: pthread_create failed for thread 41 of 64: Resource temporarily unavailable OpenBLAS blas_thread_init: RLIMIT_NPROC 4096 current, 513046 max OpenBLAS blas_thread_init: pthread_create failed for thread 42 of 64: Resource temporarily unavailable OpenBLAS blas_thread_init: RLIMIT_NPROC 4096 current, 513046 max OpenBLAS blas_thread_init: pthread_create failed for thread 43 of 64: Resource temporarily unavailable OpenBLAS blas_thread_init: RLIMIT_NPROC 4096 current, 513046 max OpenBLAS blas_thread_init: pthread_create failed for thread 44 of 64: Resource temporarily unavailable OpenBLAS blas_thread_init: RLIMIT_NPROC 4096 current, 513046 max OpenBLAS blas_thread_init: pthread_create failed for thread 45 of 64: Resource temporarily unavailable OpenBLAS blas_thread_init: RLIMIT_NPROC 4096 current, 513046 max OpenBLAS blas_thread_init: pthread_create failed for thread 46 of 64: Resource temporarily unavailable OpenBLAS blas_thread_init: RLIMIT_NPROC 4096 current, 513046 max OpenBLAS blas_thread_init: pthread_create failed for thread 47 of 64: Resource temporarily unavailable OpenBLAS blas_thread_init: RLIMIT_NPROC 4096 current, 513046 max OpenBLAS blas_thread_init: pthread_create failed for thread 48 of 64: Resource temporarily unavailable OpenBLAS blas_thread_init: RLIMIT_NPROC 4096 current, 513046 max OpenBLAS blas_thread_init: pthread_create failed for thread 49 of 64: Resource temporarily unavailable OpenBLAS blas_thread_init: RLIMIT_NPROC 4096 current, 513046 max OpenBLAS blas_thread_init: pthread_create failed for thread 50 of 64: Resource temporarily unavailable OpenBLAS blas_thread_init: RLIMIT_NPROC 4096 current, 513046 max OpenBLAS blas_thread_init: pthread_create failed for thread 51 of 64: Resource temporarily unavailable OpenBLAS blas_thread_init: RLIMIT_NPROC 4096 current, 513046 max OpenBLAS blas_thread_init: pthread_create failed for thread 52 of 64: Resource temporarily unavailable OpenBLAS blas_thread_init: RLIMIT_NPROC 4096 current, 513046 max OpenBLAS blas_thread_init: pthread_create failed for thread 53 of 64: Resource temporarily unavailable OpenBLAS blas_thread_init: RLIMIT_NPROC 4096 current, 513046 max OpenBLAS blas_thread_init: pthread_create failed for thread 54 of 64: Resource temporarily unavailable OpenBLAS blas_thread_init: RLIMIT_NPROC 4096 current, 513046 max OpenBLAS blas_thread_init: pthread_create failed for thread 55 of 64: Resource temporarily unavailable OpenBLAS blas_thread_init: RLIMIT_NPROC 4096 current, 513046 max OpenBLAS blas_thread_init: pthread_create failed for thread 56 of 64: Resource temporarily unavailable OpenBLAS blas_thread_init: RLIMIT_NPROC 4096 current, 513046 max OpenBLAS blas_thread_init: pthread_create failed for thread 57 of 64: Resource temporarily unavailable OpenBLAS blas_thread_init: RLIMIT_NPROC 4096 current, 513046 max OpenBLAS blas_thread_init: pthread_create failed for thread 58 of 64: Resource temporarily unavailable OpenBLAS blas_thread_init: RLIMIT_NPROC 4096 current, 513046 max OpenBLAS blas_thread_init: pthread_create failed for thread 59 of 64: Resource temporarily unavailable OpenBLAS blas_thread_init: RLIMIT_NPROC 4096 current, 513046 max OpenBLAS blas_thread_init: pthread_create failed for thread 60 of 64: Resource temporarily unavailable OpenBLAS blas_thread_init: RLIMIT_NPROC 4096 current, 513046 max OpenBLAS blas_thread_init: pthread_create failed for thread 61 of 64: Resource temporarily unavailable OpenBLAS blas_thread_init: RLIMIT_NPROC 4096 current, 513046 max OpenBLAS blas_thread_init: pthread_create failed for thread 62 of 64: Resource temporarily unavailable OpenBLAS blas_thread_init: RLIMIT_NPROC 4096 current, 513046 max OpenBLAS blas_thread_init: pthread_create failed for thread 63 of 64: Resource temporarily unavailable OpenBLAS blas_thread_init: RLIMIT_NPROC 4096 current, 513046 max Process Process-22: Traceback (most recent call last): File "/data/ailab/2022/ZYJ/nnunet/lib/python3.9/multiprocessing/process.py", line 315, in _bootstrap self.run() File "/data/ailab/2022/ZYJ/nnunet/lib/python3.9/multiprocessing/process.py", line 108, in run self._target(*self._args, self._kwargs) File "/data/ailab/2022/ZYJ/nnunet/lib/python3.9/site-packages/batchgenerators/dataloading/nondet_multi_threaded_augmenter.py", line 41, in producer with threadpool_limits(1, None): File "/data/ailab/2022/ZYJ/nnunet/lib/python3.9/site-packages/threadpoolctl.py", line 373, in init super().init(ThreadpoolController(), limits=limits, user_api=user_api) File "/data/ailab/2022/ZYJ/nnunet/lib/python3.9/site-packages/threadpoolctl.py", line 166, in init self._set_threadpool_limits() File "/data/ailab/2022/ZYJ/nnunet/lib/python3.9/site-packages/threadpoolctl.py", line 299, in _set_threadpool_limits lib_controller.set_num_threads(num_threads) File "/data/ailab/2022/ZYJ/nnunet/lib/python3.9/site-packages/threadpoolctl.py", line 865, in set_num_threads return set_func(num_threads) KeyboardInterrupt Exception in thread Thread-5: Traceback (most recent call last): File "/data/ailab/2022/ZYJ/nnunet/lib/python3.9/threading.py", line 980, in _bootstrap_inner self.run() File "/data/ailab/2022/ZYJ/nnunet/lib/python3.9/threading.py", line 917, in run self._target(*self._args, *self._kwargs) File "/data/ailab/2022/ZYJ/nnunet/lib/python3.9/site-packages/batchgenerators/dataloading/nondet_multi_threaded_augmenter.py", line 125, in results_loop raise e File "/data/ailab/2022/ZYJ/nnunet/lib/python3.9/site-packages/batchgenerators/dataloading/nondet_multi_threaded_augmenter.py", line 103, in results_loop raise RuntimeError("One or more background workers are no longer alive. Exiting. Please check the " RuntimeError: One or more background workers are no longer alive. Exiting. Please check the print statements above for the actual error message Traceback (most recent call last): File "/data/ailab/2022/ZYJ/nnunet/bin/nnUNetv2_train", line 33, in
sys.exit(load_entry_point('nnunetv2', 'console_scripts', 'nnUNetv2_train')())
File "/data/ailab/2022/ZYJ/nnUNet/nnunetv2/run/run_training.py", line 247, in run_training_entry
run_training(args.dataset_name_or_id, args.configuration, args.fold, args.tr, args.p, args.pretrained_weights,
File "/data/ailab/2022/ZYJ/nnUNet/nnunetv2/run/run_training.py", line 190, in run_training
nnunet_trainer.run_training()
File "/data/ailab/2022/ZYJ/nnUNet/nnunetv2/training/nnUNetTrainer/nnUNetTrainer.py", line 1217, in run_training
val_outputs.append(self.validation_step(next(self.dataloader_val)))
File "/data/ailab/2022/ZYJ/nnunet/lib/python3.9/site-packages/batchgenerators/dataloading/nondet_multi_threaded_augmenter.py", line 196, in next
item = self.__get_next_item()
File "/data/ailab/2022/ZYJ/nnunet/lib/python3.9/site-packages/batchgenerators/dataloading/nondet_multi_threaded_augmenter.py", line 181, in __get_next_item
raise RuntimeError("One or more background workers are no longer alive. Exiting. Please check the "
RuntimeError: One or more background workers are no longer alive. Exiting. Please check the print statements above for the actual error message
Exception in thread Thread-4:
Traceback (most recent call last):
File "/data/ailab/2022/ZYJ/nnunet/lib/python3.9/threading.py", line 980, in _bootstrap_inner
self.run()
File "/data/ailab/2022/ZYJ/nnunet/lib/python3.9/threading.py", line 917, in run
self._target( self._args, self._kwargs)
File "/data/ailab/2022/ZYJ/nnunet/lib/python3.9/site-packages/batchgenerators/dataloading/nondet_multi_threaded_augmenter.py", line 125, in results_loop
raise e
File "/data/ailab/2022/ZYJ/nnunet/lib/python3.9/site-packages/batchgenerators/dataloading/nondet_multi_threaded_augmenter.py", line 103, in results_loop
raise RuntimeError("One or more background workers are no longer alive. Exiting. Please check the "
RuntimeError: One or more background workers are no longer alive. Exiting. Please check the print statements above for the actual error message