Hi, I am hoping to get some assistance. I have finished the preprocessing of my dataset which seemed to go well. I am having some issues with the training - I am just training the 2d configuration for now and was only running for Fold 0 to check if it was working. I am running through a SSH connected to HPC.
However, it gives me an error after about 5 minutes. I have put the output below if someone could please help me understand the issue and how to fix it.
#######################################################################
Please cite the following paper when using nnU-Net:
Isensee, F., Jaeger, P. F., Kohl, S. A., Petersen, J., & Maier-Hein, K. H. (2021). nnU-Net: a self-configuring method for deep learning-based biomedical image segmentation. Nature methods, 18(2), 203-211.
#######################################################################
2024-10-08 20:37:48.713970: do_dummy_2d_data_aug: False
2024-10-08 20:37:48.717379: Creating new 5-fold cross-validation split...
2024-10-08 20:37:48.742585: Desired fold for training: 0
2024-10-08 20:37:48.743828: This split has 7 training and 2 validation cases.
using pin_memory on device 0
/home/hornigmm/.local/lib/python3.10/site-packages/torch/optim/lr_scheduler.py:60: UserWarning: The verbose parameter is deprecated. Please use get_last_lr() to access the learning rate.
warnings.warn(
using pin_memory on device 0
2024-10-08 20:40:08.757191: Using torch.compile...
2024-10-08 20:40:15.290197: unpacking dataset...
Traceback (most recent call last):
File "/home/hornigmm/.local/lib/python3.10/site-packages/batchgenerators/dataloading/nondet_multi_threaded_augmenter.py", line 53, in producer
item = next(data_loader)
File "/home/hornigmm/.local/lib/python3.10/site-packages/batchgenerators/dataloading/data_loader.py", line 126, in next
return self.generate_train_batch()
File "/mnt/hpccs01/home/hornigmm/nnUNet_Base_Folder/nnUNet/nnunetv2/training/dataloading/data_loader_2d.py", line 21, in generate_train_batch
data, seg, properties = self._data.load_case(current_key)
File "/mnt/hpccs01/home/hornigmm/nnUNet_Base_Folder/nnUNet/nnunetv2/training/dataloading/nnunet_dataset.py", line 86, in load_case
data = np.load(entry['data_file'][:-4] + ".npy", 'r')
File "/home/hornigmm/.local/lib/python3.10/site-packages/numpy/lib/npyio.py", line 453, in load
return format.open_memmap(file, mode=mmap_mode,
File "/home/hornigmm/.local/lib/python3.10/site-packages/numpy/lib/format.py", line 945, in open_memmap
marray = numpy.memmap(filename, dtype=dtype, shape=shape, order=order,
File "/home/hornigmm/.local/lib/python3.10/site-packages/numpy/core/memmap.py", line 268, in new
mm = mmap.mmap(fid.fileno(), bytes, access=acc, offset=start)
ValueError: mmap length is greater than file size
Exception in background worker 4:
mmap length is greater than file size
Traceback (most recent call last):
File "/home/hornigmm/.local/lib/python3.10/site-packages/batchgenerators/dataloading/nondet_multi_threaded_augmenter.py", line 53, in producer
item = next(data_loader)
File "/home/hornigmm/.local/lib/python3.10/site-packages/batchgenerators/dataloading/data_loader.py", line 126, in next
return self.generate_train_batch()
File "/mnt/hpccs01/home/hornigmm/nnUNet_Base_Folder/nnUNet/nnunetv2/training/dataloading/data_loader_2d.py", line 21, in generate_train_batch
data, seg, properties = self._data.load_case(current_key)
File "/mnt/hpccs01/home/hornigmm/nnUNet_Base_Folder/nnUNet/nnunetv2/training/dataloading/nnunet_dataset.py", line 97, in load_case
seg = np.load(entry['data_file'][:-4] + "_seg.npy", 'r')
File "/home/hornigmm/.local/lib/python3.10/site-packages/numpy/lib/npyio.py", line 453, in load
return format.open_memmap(file, mode=mmap_mode,
File "/home/hornigmm/.local/lib/python3.10/site-packages/numpy/lib/format.py", line 945, in open_memmap
marray = numpy.memmap(filename, dtype=dtype, shape=shape, order=order,
File "/home/hornigmm/.local/lib/python3.10/site-packages/numpy/core/memmap.py", line 268, in new
mm = mmap.mmap(fid.fileno(), bytes, access=acc, offset=start)
ValueError: mmap length is greater than file size
Exception in background worker 5:
mmap length is greater than file size
Traceback (most recent call last):
File "/home/hornigmm/.local/lib/python3.10/site-packages/batchgenerators/dataloading/nondet_multi_threaded_augmenter.py", line 53, in producer
item = next(data_loader)
File "/home/hornigmm/.local/lib/python3.10/site-packages/batchgenerators/dataloading/data_loader.py", line 126, in next
return self.generate_train_batch()
File "/mnt/hpccs01/home/hornigmm/nnUNet_Base_Folder/nnUNet/nnunetv2/training/dataloading/data_loader_2d.py", line 21, in generate_train_batch
data, seg, properties = self._data.load_case(current_key)
File "/mnt/hpccs01/home/hornigmm/nnUNet_Base_Folder/nnUNet/nnunetv2/training/dataloading/nnunet_dataset.py", line 97, in load_case
seg = np.load(entry['data_file'][:-4] + "_seg.npy", 'r')
File "/home/hornigmm/.local/lib/python3.10/site-packages/numpy/lib/npyio.py", line 453, in load
return format.open_memmap(file, mode=mmap_mode,
File "/home/hornigmm/.local/lib/python3.10/site-packages/numpy/lib/format.py", line 945, in open_memmap
marray = numpy.memmap(filename, dtype=dtype, shape=shape, order=order,
File "/home/hornigmm/.local/lib/python3.10/site-packages/numpy/core/memmap.py", line 268, in new
mm = mmap.mmap(fid.fileno(), bytes, access=acc, offset=start)
ValueError: mmap length is greater than file size
Exception in background worker 3:
mmap length is greater than file size
Traceback (most recent call last):
File "/home/hornigmm/.local/lib/python3.10/site-packages/batchgenerators/dataloading/nondet_multi_threaded_augmenter.py", line 53, in producer
item = next(data_loader)
File "/home/hornigmm/.local/lib/python3.10/site-packages/batchgenerators/dataloading/data_loader.py", line 126, in next
return self.generate_train_batch()
File "/mnt/hpccs01/home/hornigmm/nnUNet_Base_Folder/nnUNet/nnunetv2/training/dataloading/data_loader_2d.py", line 21, in generate_train_batch
data, seg, properties = self._data.load_case(current_key)
File "/mnt/hpccs01/home/hornigmm/nnUNet_Base_Folder/nnUNet/nnunetv2/training/dataloading/nnunet_dataset.py", line 97, in load_case
seg = np.load(entry['data_file'][:-4] + "_seg.npy", 'r')
File "/home/hornigmm/.local/lib/python3.10/site-packages/numpy/lib/npyio.py", line 453, in load
return format.open_memmap(file, mode=mmap_mode,
File "/home/hornigmm/.local/lib/python3.10/site-packages/numpy/lib/format.py", line 945, in open_memmap
marray = numpy.memmap(filename, dtype=dtype, shape=shape, order=order,
File "/home/hornigmm/.local/lib/python3.10/site-packages/numpy/core/memmap.py", line 268, in new
mm = mmap.mmap(fid.fileno(), bytes, access=acc, offset=start)
ValueError: mmap length is greater than file size
Exception in background worker 2:
mmap length is greater than file size
/data1/pbs.10494714.pbs/tmpwfjp1sgm/main.c:6:23: fatal error: stdatomic.h: No such file or directory
include
^
compilation terminated.
/data1/pbs.10494714.pbs/tmpjdkfhlyo/main.c:6:23: fatal error: stdatomic.h: No such file or directory
include
^
compilation terminated.
2024-10-08 20:40:57.131245: unpacking done...
2024-10-08 20:40:57.162988: Unable to plot network architecture: nnUNet_compile is enabled!
2024-10-08 20:40:57.508343:
2024-10-08 20:40:57.509761: Epoch 0
2024-10-08 20:40:57.511209: Current learning rate: 0.01
Traceback (most recent call last):
File "/home/hornigmm/miniconda3/envs/nnunet_venv_2/bin/nnUNetv2_train", line 8, in
sys.exit(run_training_entry())
File "/mnt/hpccs01/home/hornigmm/nnUNet_Base_Folder/nnUNet/nnunetv2/run/run_training.py", line 275, in run_training_entry
run_training(args.dataset_name_or_id, args.configuration, args.fold, args.tr, args.p, args.pretrained_weights,
File "/mnt/hpccs01/home/hornigmm/nnUNet_Base_Folder/nnUNet/nnunetv2/run/run_training.py", line 211, in run_training
nnunet_trainer.run_training()
File "/mnt/hpccs01/home/hornigmm/nnUNet_Base_Folder/nnUNet/nnunetv2/training/nnUNetTrainer/nnUNetTrainer.py", line 1370, in run_training
train_outputs.append(self.train_step(next(self.dataloader_train)))
File "/mnt/hpccs01/home/hornigmm/nnUNet_Base_Folder/nnUNet/nnunetv2/training/nnUNetTrainer/nnUNetTrainer.py", line 994, in train_step
output = self.network(data)
File "/home/hornigmm/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
return self._call_impl(*args, kwargs)
File "/home/hornigmm/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
return forward_call(*args, *kwargs)
File "/home/hornigmm/.local/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 433, in _fn
return fn(args, kwargs)
File "/home/hornigmm/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
return self._call_impl(*args, kwargs)
File "/home/hornigmm/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
return forward_call(*args, *kwargs)
File "/home/hornigmm/.local/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py", line 1116, in call
return self._torchdynamo_orig_callable(
File "/home/hornigmm/.local/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py", line 948, in call
result = self._inner_convert(
File "/home/hornigmm/.local/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py", line 472, in call
return _compile(
File "/home/hornigmm/.local/lib/python3.10/site-packages/torch/_utils_internal.py", line 84, in wrapper_function
return StrobelightCompileTimeProfiler.profile_compile_time(
File "/home/hornigmm/.local/lib/python3.10/site-packages/torch/_strobelight/compile_time_profiler.py", line 129, in profile_compile_time
return func(args, kwargs)
File "/home/hornigmm/miniconda3/envs/nnunet_venv_2/lib/python3.10/contextlib.py", line 79, in inner
return func(*args, kwds)
File "/home/hornigmm/.local/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py", line 817, in _compile
guarded_code = compile_inner(code, one_graph, hooks, transform)
File "/home/hornigmm/.local/lib/python3.10/site-packages/torch/_dynamo/utils.py", line 231, in time_wrapper
r = func(*args, *kwargs)
File "/home/hornigmm/.local/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py", line 636, in compile_inner
out_code = transform_code_object(code, transform)
File "/home/hornigmm/.local/lib/python3.10/site-packages/torch/_dynamo/bytecode_transformation.py", line 1185, in transform_code_object
transformations(instructions, code_options)
File "/home/hornigmm/.local/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py", line 178, in _fn
return fn(args, kwargs)
File "/home/hornigmm/.local/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py", line 582, in transform
tracer.run()
File "/home/hornigmm/.local/lib/python3.10/site-packages/torch/_dynamo/symbolic_convert.py", line 2451, in run
super().run()
File "/home/hornigmm/.local/lib/python3.10/site-packages/torch/_dynamo/symbolic_convert.py", line 893, in run
while self.step():
File "/home/hornigmm/.local/lib/python3.10/site-packages/torch/_dynamo/symbolic_convert.py", line 805, in step
self.dispatch_table[inst.opcode](self, inst)
File "/home/hornigmm/.local/lib/python3.10/site-packages/torch/_dynamo/symbolic_convert.py", line 2642, in RETURN_VALUE
self._return(inst)
File "/home/hornigmm/.local/lib/python3.10/site-packages/torch/_dynamo/symbolic_convert.py", line 2627, in _return
self.output.compile_subgraph(
File "/home/hornigmm/.local/lib/python3.10/site-packages/torch/_dynamo/output_graph.py", line 1123, in compile_subgraph
self.compile_and_call_fx_graph(tx, pass2.graph_output_vars(), root)
File "/home/hornigmm/miniconda3/envs/nnunet_venv_2/lib/python3.10/contextlib.py", line 79, in inner
return func(*args, kwds)
File "/home/hornigmm/.local/lib/python3.10/site-packages/torch/_dynamo/output_graph.py", line 1318, in compile_and_call_fx_graph
compiled_fn = self.call_user_compiler(gm)
File "/home/hornigmm/.local/lib/python3.10/site-packages/torch/_dynamo/utils.py", line 231, in time_wrapper
r = func(*args, *kwargs)
File "/home/hornigmm/.local/lib/python3.10/site-packages/torch/_dynamo/output_graph.py", line 1409, in call_user_compiler
raise BackendCompilerFailed(self.compiler_fn, e).with_traceback(
File "/home/hornigmm/.local/lib/python3.10/site-packages/torch/_dynamo/output_graph.py", line 1390, in call_user_compiler
compiled_fn = compiler_fn(gm, self.example_inputs())
File "/home/hornigmm/.local/lib/python3.10/site-packages/torch/_dynamo/repro/after_dynamo.py", line 129, in call
compiled_gm = compiler_fn(gm, example_inputs)
File "/home/hornigmm/.local/lib/python3.10/site-packages/torch/init.py", line 1951, in call
return compilefx(model, inputs_, config_patches=self.config)
File "/home/hornigmm/miniconda3/envs/nnunet_venv_2/lib/python3.10/contextlib.py", line 79, in inner
return func(args, kwds)
File "/home/hornigmm/.local/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1505, in compile_fx
return aot_autograd(
File "/home/hornigmm/.local/lib/python3.10/site-packages/torch/_dynamo/backends/common.py", line 69, in call
cg = aot_module_simplified(gm, example_inputs, self.kwargs)
File "/home/hornigmm/.local/lib/python3.10/site-packages/torch/_functorch/aot_autograd.py", line 954, in aot_module_simplified
compiledfn, = create_aot_dispatcher_function(
File "/home/hornigmm/.local/lib/python3.10/site-packages/torch/_dynamo/utils.py", line 231, in time_wrapper
r = func(*args, *kwargs)
File "/home/hornigmm/.local/lib/python3.10/site-packages/torch/_functorch/aot_autograd.py", line 687, in create_aot_dispatcher_function
compiled_fn, fw_metadata = compiler_fn(
File "/home/hornigmm/.local/lib/python3.10/site-packages/torch/_functorch/_aot_autograd/jit_compile_runtime_wrappers.py", line 461, in aot_dispatch_autograd
compiled_fw_func = aot_config.fw_compiler(fw_module, adjusted_flat_args)
File "/home/hornigmm/.local/lib/python3.10/site-packages/torch/_dynamo/utils.py", line 231, in time_wrapper
r = func(args, kwargs)
File "/home/hornigmm/.local/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1410, in fw_compiler_base
return inner_compile(
File "/home/hornigmm/.local/lib/python3.10/site-packages/torch/_dynamo/repro/after_aot.py", line 84, in debug_wrapper
inner_compiled_fn = compiler_fn(gm, example_inputs)
File "/home/hornigmm/.local/lib/python3.10/site-packages/torch/_inductor/debug.py", line 304, in inner
return fn(*args, kwargs)
File "/home/hornigmm/miniconda3/envs/nnunet_venv_2/lib/python3.10/contextlib.py", line 79, in inner
return func(*args, *kwds)
File "/home/hornigmm/miniconda3/envs/nnunet_venv_2/lib/python3.10/contextlib.py", line 79, in inner
return func(args, kwds)
File "/home/hornigmm/.local/lib/python3.10/site-packages/torch/_dynamo/utils.py", line 231, in time_wrapper
r = func(*args, kwargs)
File "/home/hornigmm/.local/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 527, in compile_fx_inner
compiled_graph = fx_codegen_and_compile(
File "/home/hornigmm/miniconda3/envs/nnunet_venv_2/lib/python3.10/contextlib.py", line 79, in inner
return func(*args, *kwds)
File "/home/hornigmm/.local/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 831, in fx_codegen_and_compile
compiled_fn = graph.compile_to_fn()
File "/home/hornigmm/.local/lib/python3.10/site-packages/torch/_inductor/graph.py", line 1751, in compile_to_fn
return self.compile_to_module().call
File "/home/hornigmm/.local/lib/python3.10/site-packages/torch/_dynamo/utils.py", line 231, in time_wrapper
r = func(args, kwargs)
File "/home/hornigmm/.local/lib/python3.10/site-packages/torch/_inductor/graph.py", line 1680, in compile_to_module
self.codegen_with_cpp_wrapper() if self.cpp_wrapper else self.codegen()
File "/home/hornigmm/.local/lib/python3.10/site-packages/torch/_inductor/graph.py", line 1640, in codegen
self.scheduler.codegen()
File "/home/hornigmm/.local/lib/python3.10/site-packages/torch/_dynamo/utils.py", line 231, in time_wrapper
r = func(*args, kwargs)
File "/home/hornigmm/.local/lib/python3.10/site-packages/torch/_inductor/scheduler.py", line 2741, in codegen
self.get_backend(device).codegen_node(node)
File "/home/hornigmm/.local/lib/python3.10/site-packages/torch/_inductor/codegen/cuda_combined_scheduling.py", line 69, in codegen_node
return self._triton_scheduling.codegen_node(node)
File "/home/hornigmm/.local/lib/python3.10/site-packages/torch/_inductor/codegen/simd.py", line 1148, in codegen_node
return self.codegen_node_schedule(node_schedule, buf_accesses, numel, rnumel)
File "/home/hornigmm/.local/lib/python3.10/site-packages/torch/_inductor/codegen/simd.py", line 1317, in codegen_node_schedule
src_code = kernel.codegen_kernel()
File "/home/hornigmm/.local/lib/python3.10/site-packages/torch/_inductor/codegen/triton.py", line 2159, in codegen_kernel
self.inductor_meta_common(),
File "/home/hornigmm/.local/lib/python3.10/site-packages/torch/_inductor/codegen/triton.py", line 2047, in inductor_meta_common
"backend_hash": torch.utils._triton.triton_hash_with_backend(),
File "/home/hornigmm/.local/lib/python3.10/site-packages/torch/utils/_triton.py", line 63, in triton_hash_with_backend
backend = triton_backend()
File "/home/hornigmm/.local/lib/python3.10/site-packages/torch/utils/_triton.py", line 49, in triton_backend
target = driver.active.get_current_target()
File "/home/hornigmm/.local/lib/python3.10/site-packages/triton/runtime/driver.py", line 23, in getattr
self._initialize_obj()
File "/home/hornigmm/.local/lib/python3.10/site-packages/triton/runtime/driver.py", line 20, in _initialize_obj
self._obj = self._init_fn()
File "/home/hornigmm/.local/lib/python3.10/site-packages/triton/runtime/driver.py", line 9, in _create_driver
return actives[0]()
File "/home/hornigmm/.local/lib/python3.10/site-packages/triton/backends/nvidia/driver.py", line 371, in init
self.utils = CudaUtils() # TODO: make static
File "/home/hornigmm/.local/lib/python3.10/site-packages/triton/backends/nvidia/driver.py", line 80, in init
mod = compile_module_from_src(Path(os.path.join(dirname, "driver.c")).read_text(), "cuda_utils")
File "/home/hornigmm/.local/lib/python3.10/site-packages/triton/backends/nvidia/driver.py", line 57, in compile_module_from_src
so = _build(name, src_path, tmpdir, library_dirs(), include_dir, libraries)
File "/home/hornigmm/.local/lib/python3.10/site-packages/triton/runtime/build.py", line 48, in _build
ret = subprocess.check_call(cc_cmd)
File "/home/hornigmm/miniconda3/envs/nnunet_venv_2/lib/python3.10/subprocess.py", line 369, in check_call
raise CalledProcessError(retcode, cmd)
torch._dynamo.exc.BackendCompilerFailed: backend='inductor' raised:
CalledProcessError: Command '['/usr/bin/gcc', '/data1/pbs.10494714.pbs/tmpjdkfhlyo/main.c', '-O3', '-shared', '-fPIC', '-o', '/data1/pbs.10494714.pbs/tmpjdkfhlyo/cuda_utils.cpython-310-x86_64-linux-gnu.so', '-lcuda', '-L/mnt/hpccs01/home/hornigmm/.local/lib/python3.10/site-packages/triton/backends/nvidia/lib', '-L/usr/lib64', '-L/usr/lib', '-I/mnt/hpccs01/home/hornigmm/.local/lib/python3.10/site-packages/triton/backends/nvidia/include', '-I/data1/pbs.10494714.pbs/tmpjdkfhlyo', '-I/home/hornigmm/miniconda3/envs/nnunet_venv_2/include/python3.10']' returned non-zero exit status 1.
Set TORCH_LOGS="+dynamo" and TORCHDYNAMO_VERBOSE=1 for more information
You can suppress this exception and fall back to eager by setting:
import torch._dynamo
torch._dynamo.config.suppress_errors = True
Exception in thread Thread-1 (results_loop):
Traceback (most recent call last):
File "/home/hornigmm/miniconda3/envs/nnunet_venv_2/lib/python3.10/threading.py", line 1016, in _bootstrap_inner
self.run()
File "/home/hornigmm/miniconda3/envs/nnunet_venv_2/lib/python3.10/threading.py", line 953, in run
self._target(*self._args, **self._kwargs)
File "/home/hornigmm/.local/lib/python3.10/site-packages/batchgenerators/dataloading/nondet_multi_threaded_augmenter.py", line 125, in results_loop
raise e
File "/home/hornigmm/.local/lib/python3.10/site-packages/batchgenerators/dataloading/nondet_multi_threaded_augmenter.py", line 103, in results_loop
raise RuntimeError("One or more background workers are no longer alive. Exiting. Please check the "
RuntimeError: One or more background workers are no longer alive. Exiting. Please check the print statements above for the actual error message
PBS Job 10494714.pbs
CPU time : 00:27:53
Wall time : 00:05:47
Mem usage : 133554872kb
Hi, I am hoping to get some assistance. I have finished the preprocessing of my dataset which seemed to go well. I am having some issues with the training - I am just training the 2d configuration for now and was only running for Fold 0 to check if it was working. I am running through a SSH connected to HPC.
However, it gives me an error after about 5 minutes. I have put the output below if someone could please help me understand the issue and how to fix it.
/mnt/hpccs01/home/hornigmm/nnUNet_Base_Folder/nnUNet/nnunetv2/training/nnUNetTrainer/nnUNetTrainer.py:164: FutureWarning:
torch.cuda.amp.GradScaler(args...)is deprecated. Please use
torch.amp.GradScaler('cuda', args...)` instead. self.grad_scaler = GradScaler() if self.device.type == 'cuda' else None############################ INFO: You are using the old nnU-Net default plans. We have updated our recommendations. Please consider using those instead! Read more here: https://github.com/MIC-DKFZ/nnUNet/blob/master/documentation/resenc_presets.md ############################
Using device: cuda:0
####################################################################### Please cite the following paper when using nnU-Net: Isensee, F., Jaeger, P. F., Kohl, S. A., Petersen, J., & Maier-Hein, K. H. (2021). nnU-Net: a self-configuring method for deep learning-based biomedical image segmentation. Nature methods, 18(2), 203-211. #######################################################################
2024-10-08 20:37:48.713970: do_dummy_2d_data_aug: False 2024-10-08 20:37:48.717379: Creating new 5-fold cross-validation split... 2024-10-08 20:37:48.742585: Desired fold for training: 0 2024-10-08 20:37:48.743828: This split has 7 training and 2 validation cases. using pin_memory on device 0 /home/hornigmm/.local/lib/python3.10/site-packages/torch/optim/lr_scheduler.py:60: UserWarning: The verbose parameter is deprecated. Please use get_last_lr() to access the learning rate. warnings.warn( using pin_memory on device 0 2024-10-08 20:40:08.757191: Using torch.compile...
This is the configuration used by this training: Configuration name: 2d {'data_identifier': 'nnUNetPlans_2d', 'preprocessor_name': 'DefaultPreprocessor', 'batch_size': 4, 'patch_size': [1152, 768], 'median_image_size_in_voxels': [1111.0, 713.0], 'spacing': [0.40350183844566356, 0.404444009065628], 'normalization_schemes': ['ZScoreNormalization'], 'use_mask_for_norm': [True], 'resampling_fn_data': 'resample_data_or_seg_to_shape', 'resampling_fn_seg': 'resample_data_or_seg_to_shape', 'resampling_fn_data_kwargs': {'is_seg': False, 'order': 3, 'order_z': 0, 'force_separate_z': None}, 'resampling_fn_seg_kwargs': {'is_seg': True, 'order': 1, 'order_z': 0, 'force_separate_z': None}, 'resampling_fn_probabilities': 'resample_data_or_seg_to_shape', 'resampling_fn_probabilities_kwargs': {'is_seg': False, 'order': 1, 'order_z': 0, 'force_separate_z': None}, 'architecture': {'network_class_name': 'dynamic_network_architectures.architectures.unet.PlainConvUNet', 'arch_kwargs': {'n_stages': 8, 'features_per_stage': [32, 64, 128, 256, 512, 512, 512, 512], 'conv_op': 'torch.nn.modules.conv.Conv2d', 'kernel_sizes': [[3, 3], [3, 3], [3, 3], [3, 3], [3, 3], [3, 3], [3, 3], [3, 3]], 'strides': [[1, 1], [2, 2], [2, 2], [2, 2], [2, 2], [2, 2], [2, 2], [2, 2]], 'n_conv_per_stage': [2, 2, 2, 2, 2, 2, 2, 2], 'n_conv_per_stage_decoder': [2, 2, 2, 2, 2, 2, 2], 'conv_bias': True, 'norm_op': 'torch.nn.modules.instancenorm.InstanceNorm2d', 'norm_op_kwargs': {'eps': 1e-05, 'affine': True}, 'dropout_op': None, 'dropout_op_kwargs': None, 'nonlin': 'torch.nn.LeakyReLU', 'nonlin_kwargs': {'inplace': True}}, '_kw_requires_import': ['conv_op', 'norm_op', 'dropout_op', 'nonlin']}, 'batch_dice': True}
These are the global plan.json settings: {'dataset_name': 'Dataset701_SHOULDER', 'plans_name': 'nnUNetPlans', 'original_median_spacing_after_transp': [0.40444400906562805, 0.40350183844566356, 0.404444009065628], 'original_median_shape_after_transp': [1059, 1114, 656], 'image_reader_writer': 'SimpleITKIO', 'transpose_forward': [2, 0, 1], 'transpose_backward': [1, 2, 0], 'experiment_planner_used': 'ExperimentPlanner', 'label_manager': 'LabelManager', 'foreground_intensity_properties_per_channel': {'0': {'max': 284.0, 'mean': 61.74697494506836, 'median': 60.0, 'min': 0.0, 'percentile_00_5': 6.0, 'percentile_99_5': 132.0, 'std': 24.240419387817383}}}
2024-10-08 20:40:15.290197: unpacking dataset... Traceback (most recent call last): File "/home/hornigmm/.local/lib/python3.10/site-packages/batchgenerators/dataloading/nondet_multi_threaded_augmenter.py", line 53, in producer item = next(data_loader) File "/home/hornigmm/.local/lib/python3.10/site-packages/batchgenerators/dataloading/data_loader.py", line 126, in next return self.generate_train_batch() File "/mnt/hpccs01/home/hornigmm/nnUNet_Base_Folder/nnUNet/nnunetv2/training/dataloading/data_loader_2d.py", line 21, in generate_train_batch data, seg, properties = self._data.load_case(current_key) File "/mnt/hpccs01/home/hornigmm/nnUNet_Base_Folder/nnUNet/nnunetv2/training/dataloading/nnunet_dataset.py", line 86, in load_case data = np.load(entry['data_file'][:-4] + ".npy", 'r') File "/home/hornigmm/.local/lib/python3.10/site-packages/numpy/lib/npyio.py", line 453, in load return format.open_memmap(file, mode=mmap_mode, File "/home/hornigmm/.local/lib/python3.10/site-packages/numpy/lib/format.py", line 945, in open_memmap marray = numpy.memmap(filename, dtype=dtype, shape=shape, order=order, File "/home/hornigmm/.local/lib/python3.10/site-packages/numpy/core/memmap.py", line 268, in new mm = mmap.mmap(fid.fileno(), bytes, access=acc, offset=start) ValueError: mmap length is greater than file size Exception in background worker 4: mmap length is greater than file size Traceback (most recent call last): File "/home/hornigmm/.local/lib/python3.10/site-packages/batchgenerators/dataloading/nondet_multi_threaded_augmenter.py", line 53, in producer item = next(data_loader) File "/home/hornigmm/.local/lib/python3.10/site-packages/batchgenerators/dataloading/data_loader.py", line 126, in next return self.generate_train_batch() File "/mnt/hpccs01/home/hornigmm/nnUNet_Base_Folder/nnUNet/nnunetv2/training/dataloading/data_loader_2d.py", line 21, in generate_train_batch data, seg, properties = self._data.load_case(current_key) File "/mnt/hpccs01/home/hornigmm/nnUNet_Base_Folder/nnUNet/nnunetv2/training/dataloading/nnunet_dataset.py", line 97, in load_case seg = np.load(entry['data_file'][:-4] + "_seg.npy", 'r') File "/home/hornigmm/.local/lib/python3.10/site-packages/numpy/lib/npyio.py", line 453, in load return format.open_memmap(file, mode=mmap_mode, File "/home/hornigmm/.local/lib/python3.10/site-packages/numpy/lib/format.py", line 945, in open_memmap marray = numpy.memmap(filename, dtype=dtype, shape=shape, order=order, File "/home/hornigmm/.local/lib/python3.10/site-packages/numpy/core/memmap.py", line 268, in new mm = mmap.mmap(fid.fileno(), bytes, access=acc, offset=start) ValueError: mmap length is greater than file size Exception in background worker 5: mmap length is greater than file size Traceback (most recent call last): File "/home/hornigmm/.local/lib/python3.10/site-packages/batchgenerators/dataloading/nondet_multi_threaded_augmenter.py", line 53, in producer item = next(data_loader) File "/home/hornigmm/.local/lib/python3.10/site-packages/batchgenerators/dataloading/data_loader.py", line 126, in next return self.generate_train_batch() File "/mnt/hpccs01/home/hornigmm/nnUNet_Base_Folder/nnUNet/nnunetv2/training/dataloading/data_loader_2d.py", line 21, in generate_train_batch data, seg, properties = self._data.load_case(current_key) File "/mnt/hpccs01/home/hornigmm/nnUNet_Base_Folder/nnUNet/nnunetv2/training/dataloading/nnunet_dataset.py", line 97, in load_case seg = np.load(entry['data_file'][:-4] + "_seg.npy", 'r') File "/home/hornigmm/.local/lib/python3.10/site-packages/numpy/lib/npyio.py", line 453, in load return format.open_memmap(file, mode=mmap_mode, File "/home/hornigmm/.local/lib/python3.10/site-packages/numpy/lib/format.py", line 945, in open_memmap marray = numpy.memmap(filename, dtype=dtype, shape=shape, order=order, File "/home/hornigmm/.local/lib/python3.10/site-packages/numpy/core/memmap.py", line 268, in new mm = mmap.mmap(fid.fileno(), bytes, access=acc, offset=start) ValueError: mmap length is greater than file size Exception in background worker 3: mmap length is greater than file size Traceback (most recent call last): File "/home/hornigmm/.local/lib/python3.10/site-packages/batchgenerators/dataloading/nondet_multi_threaded_augmenter.py", line 53, in producer item = next(data_loader) File "/home/hornigmm/.local/lib/python3.10/site-packages/batchgenerators/dataloading/data_loader.py", line 126, in next return self.generate_train_batch() File "/mnt/hpccs01/home/hornigmm/nnUNet_Base_Folder/nnUNet/nnunetv2/training/dataloading/data_loader_2d.py", line 21, in generate_train_batch data, seg, properties = self._data.load_case(current_key) File "/mnt/hpccs01/home/hornigmm/nnUNet_Base_Folder/nnUNet/nnunetv2/training/dataloading/nnunet_dataset.py", line 97, in load_case seg = np.load(entry['data_file'][:-4] + "_seg.npy", 'r') File "/home/hornigmm/.local/lib/python3.10/site-packages/numpy/lib/npyio.py", line 453, in load return format.open_memmap(file, mode=mmap_mode, File "/home/hornigmm/.local/lib/python3.10/site-packages/numpy/lib/format.py", line 945, in open_memmap marray = numpy.memmap(filename, dtype=dtype, shape=shape, order=order, File "/home/hornigmm/.local/lib/python3.10/site-packages/numpy/core/memmap.py", line 268, in new mm = mmap.mmap(fid.fileno(), bytes, access=acc, offset=start) ValueError: mmap length is greater than file size Exception in background worker 2: mmap length is greater than file size /data1/pbs.10494714.pbs/tmpwfjp1sgm/main.c:6:23: fatal error: stdatomic.h: No such file or directory
include
compilation terminated. /data1/pbs.10494714.pbs/tmpjdkfhlyo/main.c:6:23: fatal error: stdatomic.h: No such file or directory
include
compilation terminated. 2024-10-08 20:40:57.131245: unpacking done... 2024-10-08 20:40:57.162988: Unable to plot network architecture: nnUNet_compile is enabled! 2024-10-08 20:40:57.508343: 2024-10-08 20:40:57.509761: Epoch 0 2024-10-08 20:40:57.511209: Current learning rate: 0.01 Traceback (most recent call last): File "/home/hornigmm/miniconda3/envs/nnunet_venv_2/bin/nnUNetv2_train", line 8, in
sys.exit(run_training_entry())
File "/mnt/hpccs01/home/hornigmm/nnUNet_Base_Folder/nnUNet/nnunetv2/run/run_training.py", line 275, in run_training_entry
run_training(args.dataset_name_or_id, args.configuration, args.fold, args.tr, args.p, args.pretrained_weights,
File "/mnt/hpccs01/home/hornigmm/nnUNet_Base_Folder/nnUNet/nnunetv2/run/run_training.py", line 211, in run_training
nnunet_trainer.run_training()
File "/mnt/hpccs01/home/hornigmm/nnUNet_Base_Folder/nnUNet/nnunetv2/training/nnUNetTrainer/nnUNetTrainer.py", line 1370, in run_training
train_outputs.append(self.train_step(next(self.dataloader_train)))
File "/mnt/hpccs01/home/hornigmm/nnUNet_Base_Folder/nnUNet/nnunetv2/training/nnUNetTrainer/nnUNetTrainer.py", line 994, in train_step
output = self.network(data)
File "/home/hornigmm/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
return self._call_impl(*args, kwargs)
File "/home/hornigmm/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
return forward_call(*args, *kwargs)
File "/home/hornigmm/.local/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 433, in _fn
return fn(args, kwargs)
File "/home/hornigmm/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
return self._call_impl(*args, kwargs)
File "/home/hornigmm/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
return forward_call(*args, *kwargs)
File "/home/hornigmm/.local/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py", line 1116, in call
return self._torchdynamo_orig_callable(
File "/home/hornigmm/.local/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py", line 948, in call
result = self._inner_convert(
File "/home/hornigmm/.local/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py", line 472, in call
return _compile(
File "/home/hornigmm/.local/lib/python3.10/site-packages/torch/_utils_internal.py", line 84, in wrapper_function
return StrobelightCompileTimeProfiler.profile_compile_time(
File "/home/hornigmm/.local/lib/python3.10/site-packages/torch/_strobelight/compile_time_profiler.py", line 129, in profile_compile_time
return func(args, kwargs)
File "/home/hornigmm/miniconda3/envs/nnunet_venv_2/lib/python3.10/contextlib.py", line 79, in inner
return func(*args, kwds)
File "/home/hornigmm/.local/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py", line 817, in _compile
guarded_code = compile_inner(code, one_graph, hooks, transform)
File "/home/hornigmm/.local/lib/python3.10/site-packages/torch/_dynamo/utils.py", line 231, in time_wrapper
r = func(*args, *kwargs)
File "/home/hornigmm/.local/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py", line 636, in compile_inner
out_code = transform_code_object(code, transform)
File "/home/hornigmm/.local/lib/python3.10/site-packages/torch/_dynamo/bytecode_transformation.py", line 1185, in transform_code_object
transformations(instructions, code_options)
File "/home/hornigmm/.local/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py", line 178, in _fn
return fn(args, kwargs)
File "/home/hornigmm/.local/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py", line 582, in transform
tracer.run()
File "/home/hornigmm/.local/lib/python3.10/site-packages/torch/_dynamo/symbolic_convert.py", line 2451, in run
super().run()
File "/home/hornigmm/.local/lib/python3.10/site-packages/torch/_dynamo/symbolic_convert.py", line 893, in run
while self.step():
File "/home/hornigmm/.local/lib/python3.10/site-packages/torch/_dynamo/symbolic_convert.py", line 805, in step
self.dispatch_table[inst.opcode](self, inst)
File "/home/hornigmm/.local/lib/python3.10/site-packages/torch/_dynamo/symbolic_convert.py", line 2642, in RETURN_VALUE
self._return(inst)
File "/home/hornigmm/.local/lib/python3.10/site-packages/torch/_dynamo/symbolic_convert.py", line 2627, in _return
self.output.compile_subgraph(
File "/home/hornigmm/.local/lib/python3.10/site-packages/torch/_dynamo/output_graph.py", line 1123, in compile_subgraph
self.compile_and_call_fx_graph(tx, pass2.graph_output_vars(), root)
File "/home/hornigmm/miniconda3/envs/nnunet_venv_2/lib/python3.10/contextlib.py", line 79, in inner
return func(*args, kwds)
File "/home/hornigmm/.local/lib/python3.10/site-packages/torch/_dynamo/output_graph.py", line 1318, in compile_and_call_fx_graph
compiled_fn = self.call_user_compiler(gm)
File "/home/hornigmm/.local/lib/python3.10/site-packages/torch/_dynamo/utils.py", line 231, in time_wrapper
r = func(*args, *kwargs)
File "/home/hornigmm/.local/lib/python3.10/site-packages/torch/_dynamo/output_graph.py", line 1409, in call_user_compiler
raise BackendCompilerFailed(self.compiler_fn, e).with_traceback(
File "/home/hornigmm/.local/lib/python3.10/site-packages/torch/_dynamo/output_graph.py", line 1390, in call_user_compiler
compiled_fn = compiler_fn(gm, self.example_inputs())
File "/home/hornigmm/.local/lib/python3.10/site-packages/torch/_dynamo/repro/after_dynamo.py", line 129, in call
compiled_gm = compiler_fn(gm, example_inputs)
File "/home/hornigmm/.local/lib/python3.10/site-packages/torch/init.py", line 1951, in call
return compilefx(model, inputs_, config_patches=self.config)
File "/home/hornigmm/miniconda3/envs/nnunet_venv_2/lib/python3.10/contextlib.py", line 79, in inner
return func(args, kwds)
File "/home/hornigmm/.local/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1505, in compile_fx
return aot_autograd(
File "/home/hornigmm/.local/lib/python3.10/site-packages/torch/_dynamo/backends/common.py", line 69, in call
cg = aot_module_simplified(gm, example_inputs, self.kwargs)
File "/home/hornigmm/.local/lib/python3.10/site-packages/torch/_functorch/aot_autograd.py", line 954, in aot_module_simplified
compiledfn, = create_aot_dispatcher_function(
File "/home/hornigmm/.local/lib/python3.10/site-packages/torch/_dynamo/utils.py", line 231, in time_wrapper
r = func(*args, *kwargs)
File "/home/hornigmm/.local/lib/python3.10/site-packages/torch/_functorch/aot_autograd.py", line 687, in create_aot_dispatcher_function
compiled_fn, fw_metadata = compiler_fn(
File "/home/hornigmm/.local/lib/python3.10/site-packages/torch/_functorch/_aot_autograd/jit_compile_runtime_wrappers.py", line 461, in aot_dispatch_autograd
compiled_fw_func = aot_config.fw_compiler(fw_module, adjusted_flat_args)
File "/home/hornigmm/.local/lib/python3.10/site-packages/torch/_dynamo/utils.py", line 231, in time_wrapper
r = func(args, kwargs)
File "/home/hornigmm/.local/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1410, in fw_compiler_base
return inner_compile(
File "/home/hornigmm/.local/lib/python3.10/site-packages/torch/_dynamo/repro/after_aot.py", line 84, in debug_wrapper
inner_compiled_fn = compiler_fn(gm, example_inputs)
File "/home/hornigmm/.local/lib/python3.10/site-packages/torch/_inductor/debug.py", line 304, in inner
return fn(*args, kwargs)
File "/home/hornigmm/miniconda3/envs/nnunet_venv_2/lib/python3.10/contextlib.py", line 79, in inner
return func(*args, *kwds)
File "/home/hornigmm/miniconda3/envs/nnunet_venv_2/lib/python3.10/contextlib.py", line 79, in inner
return func(args, kwds)
File "/home/hornigmm/.local/lib/python3.10/site-packages/torch/_dynamo/utils.py", line 231, in time_wrapper
r = func(*args, kwargs)
File "/home/hornigmm/.local/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 527, in compile_fx_inner
compiled_graph = fx_codegen_and_compile(
File "/home/hornigmm/miniconda3/envs/nnunet_venv_2/lib/python3.10/contextlib.py", line 79, in inner
return func(*args, *kwds)
File "/home/hornigmm/.local/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 831, in fx_codegen_and_compile
compiled_fn = graph.compile_to_fn()
File "/home/hornigmm/.local/lib/python3.10/site-packages/torch/_inductor/graph.py", line 1751, in compile_to_fn
return self.compile_to_module().call
File "/home/hornigmm/.local/lib/python3.10/site-packages/torch/_dynamo/utils.py", line 231, in time_wrapper
r = func(args, kwargs)
File "/home/hornigmm/.local/lib/python3.10/site-packages/torch/_inductor/graph.py", line 1680, in compile_to_module
self.codegen_with_cpp_wrapper() if self.cpp_wrapper else self.codegen()
File "/home/hornigmm/.local/lib/python3.10/site-packages/torch/_inductor/graph.py", line 1640, in codegen
self.scheduler.codegen()
File "/home/hornigmm/.local/lib/python3.10/site-packages/torch/_dynamo/utils.py", line 231, in time_wrapper
r = func(*args, kwargs)
File "/home/hornigmm/.local/lib/python3.10/site-packages/torch/_inductor/scheduler.py", line 2741, in codegen
self.get_backend(device).codegen_node(node)
File "/home/hornigmm/.local/lib/python3.10/site-packages/torch/_inductor/codegen/cuda_combined_scheduling.py", line 69, in codegen_node
return self._triton_scheduling.codegen_node(node)
File "/home/hornigmm/.local/lib/python3.10/site-packages/torch/_inductor/codegen/simd.py", line 1148, in codegen_node
return self.codegen_node_schedule(node_schedule, buf_accesses, numel, rnumel)
File "/home/hornigmm/.local/lib/python3.10/site-packages/torch/_inductor/codegen/simd.py", line 1317, in codegen_node_schedule
src_code = kernel.codegen_kernel()
File "/home/hornigmm/.local/lib/python3.10/site-packages/torch/_inductor/codegen/triton.py", line 2159, in codegen_kernel
self.inductor_meta_common(),
File "/home/hornigmm/.local/lib/python3.10/site-packages/torch/_inductor/codegen/triton.py", line 2047, in inductor_meta_common
"backend_hash": torch.utils._triton.triton_hash_with_backend(),
File "/home/hornigmm/.local/lib/python3.10/site-packages/torch/utils/_triton.py", line 63, in triton_hash_with_backend
backend = triton_backend()
File "/home/hornigmm/.local/lib/python3.10/site-packages/torch/utils/_triton.py", line 49, in triton_backend
target = driver.active.get_current_target()
File "/home/hornigmm/.local/lib/python3.10/site-packages/triton/runtime/driver.py", line 23, in getattr
self._initialize_obj()
File "/home/hornigmm/.local/lib/python3.10/site-packages/triton/runtime/driver.py", line 20, in _initialize_obj
self._obj = self._init_fn()
File "/home/hornigmm/.local/lib/python3.10/site-packages/triton/runtime/driver.py", line 9, in _create_driver
return actives[0]()
File "/home/hornigmm/.local/lib/python3.10/site-packages/triton/backends/nvidia/driver.py", line 371, in init
self.utils = CudaUtils() # TODO: make static
File "/home/hornigmm/.local/lib/python3.10/site-packages/triton/backends/nvidia/driver.py", line 80, in init
mod = compile_module_from_src(Path(os.path.join(dirname, "driver.c")).read_text(), "cuda_utils")
File "/home/hornigmm/.local/lib/python3.10/site-packages/triton/backends/nvidia/driver.py", line 57, in compile_module_from_src
so = _build(name, src_path, tmpdir, library_dirs(), include_dir, libraries)
File "/home/hornigmm/.local/lib/python3.10/site-packages/triton/runtime/build.py", line 48, in _build
ret = subprocess.check_call(cc_cmd)
File "/home/hornigmm/miniconda3/envs/nnunet_venv_2/lib/python3.10/subprocess.py", line 369, in check_call
raise CalledProcessError(retcode, cmd)
torch._dynamo.exc.BackendCompilerFailed: backend='inductor' raised:
CalledProcessError: Command '['/usr/bin/gcc', '/data1/pbs.10494714.pbs/tmpjdkfhlyo/main.c', '-O3', '-shared', '-fPIC', '-o', '/data1/pbs.10494714.pbs/tmpjdkfhlyo/cuda_utils.cpython-310-x86_64-linux-gnu.so', '-lcuda', '-L/mnt/hpccs01/home/hornigmm/.local/lib/python3.10/site-packages/triton/backends/nvidia/lib', '-L/usr/lib64', '-L/usr/lib', '-I/mnt/hpccs01/home/hornigmm/.local/lib/python3.10/site-packages/triton/backends/nvidia/include', '-I/data1/pbs.10494714.pbs/tmpjdkfhlyo', '-I/home/hornigmm/miniconda3/envs/nnunet_venv_2/include/python3.10']' returned non-zero exit status 1.
Set TORCH_LOGS="+dynamo" and TORCHDYNAMO_VERBOSE=1 for more information
You can suppress this exception and fall back to eager by setting: import torch._dynamo torch._dynamo.config.suppress_errors = True
Exception in thread Thread-1 (results_loop): Traceback (most recent call last): File "/home/hornigmm/miniconda3/envs/nnunet_venv_2/lib/python3.10/threading.py", line 1016, in _bootstrap_inner self.run() File "/home/hornigmm/miniconda3/envs/nnunet_venv_2/lib/python3.10/threading.py", line 953, in run self._target(*self._args, **self._kwargs) File "/home/hornigmm/.local/lib/python3.10/site-packages/batchgenerators/dataloading/nondet_multi_threaded_augmenter.py", line 125, in results_loop raise e File "/home/hornigmm/.local/lib/python3.10/site-packages/batchgenerators/dataloading/nondet_multi_threaded_augmenter.py", line 103, in results_loop raise RuntimeError("One or more background workers are no longer alive. Exiting. Please check the " RuntimeError: One or more background workers are no longer alive. Exiting. Please check the print statements above for the actual error message PBS Job 10494714.pbs CPU time : 00:27:53 Wall time : 00:05:47 Mem usage : 133554872kb
`