MIC-DKFZ / nnUNet

Apache License 2.0
5.95k stars 1.77k forks source link

Training. Torch Error? #2533

Open maggie-may-22 opened 1 month ago

maggie-may-22 commented 1 month ago

Hi, I am hoping to get some assistance. I have finished the preprocessing of my dataset which seemed to go well. I am having some issues with the training - I am just training the 2d configuration for now and was only running for Fold 0 to check if it was working. I am running through a SSH connected to HPC.

However, it gives me an error after about 5 minutes. I have put the output below if someone could please help me understand the issue and how to fix it.

/mnt/hpccs01/home/hornigmm/nnUNet_Base_Folder/nnUNet/nnunetv2/training/nnUNetTrainer/nnUNetTrainer.py:164: FutureWarning:torch.cuda.amp.GradScaler(args...)is deprecated. Please usetorch.amp.GradScaler('cuda', args...)` instead. self.grad_scaler = GradScaler() if self.device.type == 'cuda' else None

############################ INFO: You are using the old nnU-Net default plans. We have updated our recommendations. Please consider using those instead! Read more here: https://github.com/MIC-DKFZ/nnUNet/blob/master/documentation/resenc_presets.md ############################

Using device: cuda:0

####################################################################### Please cite the following paper when using nnU-Net: Isensee, F., Jaeger, P. F., Kohl, S. A., Petersen, J., & Maier-Hein, K. H. (2021). nnU-Net: a self-configuring method for deep learning-based biomedical image segmentation. Nature methods, 18(2), 203-211. #######################################################################

2024-10-08 20:37:48.713970: do_dummy_2d_data_aug: False 2024-10-08 20:37:48.717379: Creating new 5-fold cross-validation split... 2024-10-08 20:37:48.742585: Desired fold for training: 0 2024-10-08 20:37:48.743828: This split has 7 training and 2 validation cases. using pin_memory on device 0 /home/hornigmm/.local/lib/python3.10/site-packages/torch/optim/lr_scheduler.py:60: UserWarning: The verbose parameter is deprecated. Please use get_last_lr() to access the learning rate. warnings.warn( using pin_memory on device 0 2024-10-08 20:40:08.757191: Using torch.compile...

This is the configuration used by this training: Configuration name: 2d {'data_identifier': 'nnUNetPlans_2d', 'preprocessor_name': 'DefaultPreprocessor', 'batch_size': 4, 'patch_size': [1152, 768], 'median_image_size_in_voxels': [1111.0, 713.0], 'spacing': [0.40350183844566356, 0.404444009065628], 'normalization_schemes': ['ZScoreNormalization'], 'use_mask_for_norm': [True], 'resampling_fn_data': 'resample_data_or_seg_to_shape', 'resampling_fn_seg': 'resample_data_or_seg_to_shape', 'resampling_fn_data_kwargs': {'is_seg': False, 'order': 3, 'order_z': 0, 'force_separate_z': None}, 'resampling_fn_seg_kwargs': {'is_seg': True, 'order': 1, 'order_z': 0, 'force_separate_z': None}, 'resampling_fn_probabilities': 'resample_data_or_seg_to_shape', 'resampling_fn_probabilities_kwargs': {'is_seg': False, 'order': 1, 'order_z': 0, 'force_separate_z': None}, 'architecture': {'network_class_name': 'dynamic_network_architectures.architectures.unet.PlainConvUNet', 'arch_kwargs': {'n_stages': 8, 'features_per_stage': [32, 64, 128, 256, 512, 512, 512, 512], 'conv_op': 'torch.nn.modules.conv.Conv2d', 'kernel_sizes': [[3, 3], [3, 3], [3, 3], [3, 3], [3, 3], [3, 3], [3, 3], [3, 3]], 'strides': [[1, 1], [2, 2], [2, 2], [2, 2], [2, 2], [2, 2], [2, 2], [2, 2]], 'n_conv_per_stage': [2, 2, 2, 2, 2, 2, 2, 2], 'n_conv_per_stage_decoder': [2, 2, 2, 2, 2, 2, 2], 'conv_bias': True, 'norm_op': 'torch.nn.modules.instancenorm.InstanceNorm2d', 'norm_op_kwargs': {'eps': 1e-05, 'affine': True}, 'dropout_op': None, 'dropout_op_kwargs': None, 'nonlin': 'torch.nn.LeakyReLU', 'nonlin_kwargs': {'inplace': True}}, '_kw_requires_import': ['conv_op', 'norm_op', 'dropout_op', 'nonlin']}, 'batch_dice': True}

These are the global plan.json settings: {'dataset_name': 'Dataset701_SHOULDER', 'plans_name': 'nnUNetPlans', 'original_median_spacing_after_transp': [0.40444400906562805, 0.40350183844566356, 0.404444009065628], 'original_median_shape_after_transp': [1059, 1114, 656], 'image_reader_writer': 'SimpleITKIO', 'transpose_forward': [2, 0, 1], 'transpose_backward': [1, 2, 0], 'experiment_planner_used': 'ExperimentPlanner', 'label_manager': 'LabelManager', 'foreground_intensity_properties_per_channel': {'0': {'max': 284.0, 'mean': 61.74697494506836, 'median': 60.0, 'min': 0.0, 'percentile_00_5': 6.0, 'percentile_99_5': 132.0, 'std': 24.240419387817383}}}

2024-10-08 20:40:15.290197: unpacking dataset... Traceback (most recent call last): File "/home/hornigmm/.local/lib/python3.10/site-packages/batchgenerators/dataloading/nondet_multi_threaded_augmenter.py", line 53, in producer item = next(data_loader) File "/home/hornigmm/.local/lib/python3.10/site-packages/batchgenerators/dataloading/data_loader.py", line 126, in next return self.generate_train_batch() File "/mnt/hpccs01/home/hornigmm/nnUNet_Base_Folder/nnUNet/nnunetv2/training/dataloading/data_loader_2d.py", line 21, in generate_train_batch data, seg, properties = self._data.load_case(current_key) File "/mnt/hpccs01/home/hornigmm/nnUNet_Base_Folder/nnUNet/nnunetv2/training/dataloading/nnunet_dataset.py", line 86, in load_case data = np.load(entry['data_file'][:-4] + ".npy", 'r') File "/home/hornigmm/.local/lib/python3.10/site-packages/numpy/lib/npyio.py", line 453, in load return format.open_memmap(file, mode=mmap_mode, File "/home/hornigmm/.local/lib/python3.10/site-packages/numpy/lib/format.py", line 945, in open_memmap marray = numpy.memmap(filename, dtype=dtype, shape=shape, order=order, File "/home/hornigmm/.local/lib/python3.10/site-packages/numpy/core/memmap.py", line 268, in new mm = mmap.mmap(fid.fileno(), bytes, access=acc, offset=start) ValueError: mmap length is greater than file size Exception in background worker 4: mmap length is greater than file size Traceback (most recent call last): File "/home/hornigmm/.local/lib/python3.10/site-packages/batchgenerators/dataloading/nondet_multi_threaded_augmenter.py", line 53, in producer item = next(data_loader) File "/home/hornigmm/.local/lib/python3.10/site-packages/batchgenerators/dataloading/data_loader.py", line 126, in next return self.generate_train_batch() File "/mnt/hpccs01/home/hornigmm/nnUNet_Base_Folder/nnUNet/nnunetv2/training/dataloading/data_loader_2d.py", line 21, in generate_train_batch data, seg, properties = self._data.load_case(current_key) File "/mnt/hpccs01/home/hornigmm/nnUNet_Base_Folder/nnUNet/nnunetv2/training/dataloading/nnunet_dataset.py", line 97, in load_case seg = np.load(entry['data_file'][:-4] + "_seg.npy", 'r') File "/home/hornigmm/.local/lib/python3.10/site-packages/numpy/lib/npyio.py", line 453, in load return format.open_memmap(file, mode=mmap_mode, File "/home/hornigmm/.local/lib/python3.10/site-packages/numpy/lib/format.py", line 945, in open_memmap marray = numpy.memmap(filename, dtype=dtype, shape=shape, order=order, File "/home/hornigmm/.local/lib/python3.10/site-packages/numpy/core/memmap.py", line 268, in new mm = mmap.mmap(fid.fileno(), bytes, access=acc, offset=start) ValueError: mmap length is greater than file size Exception in background worker 5: mmap length is greater than file size Traceback (most recent call last): File "/home/hornigmm/.local/lib/python3.10/site-packages/batchgenerators/dataloading/nondet_multi_threaded_augmenter.py", line 53, in producer item = next(data_loader) File "/home/hornigmm/.local/lib/python3.10/site-packages/batchgenerators/dataloading/data_loader.py", line 126, in next return self.generate_train_batch() File "/mnt/hpccs01/home/hornigmm/nnUNet_Base_Folder/nnUNet/nnunetv2/training/dataloading/data_loader_2d.py", line 21, in generate_train_batch data, seg, properties = self._data.load_case(current_key) File "/mnt/hpccs01/home/hornigmm/nnUNet_Base_Folder/nnUNet/nnunetv2/training/dataloading/nnunet_dataset.py", line 97, in load_case seg = np.load(entry['data_file'][:-4] + "_seg.npy", 'r') File "/home/hornigmm/.local/lib/python3.10/site-packages/numpy/lib/npyio.py", line 453, in load return format.open_memmap(file, mode=mmap_mode, File "/home/hornigmm/.local/lib/python3.10/site-packages/numpy/lib/format.py", line 945, in open_memmap marray = numpy.memmap(filename, dtype=dtype, shape=shape, order=order, File "/home/hornigmm/.local/lib/python3.10/site-packages/numpy/core/memmap.py", line 268, in new mm = mmap.mmap(fid.fileno(), bytes, access=acc, offset=start) ValueError: mmap length is greater than file size Exception in background worker 3: mmap length is greater than file size Traceback (most recent call last): File "/home/hornigmm/.local/lib/python3.10/site-packages/batchgenerators/dataloading/nondet_multi_threaded_augmenter.py", line 53, in producer item = next(data_loader) File "/home/hornigmm/.local/lib/python3.10/site-packages/batchgenerators/dataloading/data_loader.py", line 126, in next return self.generate_train_batch() File "/mnt/hpccs01/home/hornigmm/nnUNet_Base_Folder/nnUNet/nnunetv2/training/dataloading/data_loader_2d.py", line 21, in generate_train_batch data, seg, properties = self._data.load_case(current_key) File "/mnt/hpccs01/home/hornigmm/nnUNet_Base_Folder/nnUNet/nnunetv2/training/dataloading/nnunet_dataset.py", line 97, in load_case seg = np.load(entry['data_file'][:-4] + "_seg.npy", 'r') File "/home/hornigmm/.local/lib/python3.10/site-packages/numpy/lib/npyio.py", line 453, in load return format.open_memmap(file, mode=mmap_mode, File "/home/hornigmm/.local/lib/python3.10/site-packages/numpy/lib/format.py", line 945, in open_memmap marray = numpy.memmap(filename, dtype=dtype, shape=shape, order=order, File "/home/hornigmm/.local/lib/python3.10/site-packages/numpy/core/memmap.py", line 268, in new mm = mmap.mmap(fid.fileno(), bytes, access=acc, offset=start) ValueError: mmap length is greater than file size Exception in background worker 2: mmap length is greater than file size /data1/pbs.10494714.pbs/tmpwfjp1sgm/main.c:6:23: fatal error: stdatomic.h: No such file or directory

include

                   ^

compilation terminated. /data1/pbs.10494714.pbs/tmpjdkfhlyo/main.c:6:23: fatal error: stdatomic.h: No such file or directory

include

                   ^

compilation terminated. 2024-10-08 20:40:57.131245: unpacking done... 2024-10-08 20:40:57.162988: Unable to plot network architecture: nnUNet_compile is enabled! 2024-10-08 20:40:57.508343: 2024-10-08 20:40:57.509761: Epoch 0 2024-10-08 20:40:57.511209: Current learning rate: 0.01 Traceback (most recent call last): File "/home/hornigmm/miniconda3/envs/nnunet_venv_2/bin/nnUNetv2_train", line 8, in sys.exit(run_training_entry()) File "/mnt/hpccs01/home/hornigmm/nnUNet_Base_Folder/nnUNet/nnunetv2/run/run_training.py", line 275, in run_training_entry run_training(args.dataset_name_or_id, args.configuration, args.fold, args.tr, args.p, args.pretrained_weights, File "/mnt/hpccs01/home/hornigmm/nnUNet_Base_Folder/nnUNet/nnunetv2/run/run_training.py", line 211, in run_training nnunet_trainer.run_training() File "/mnt/hpccs01/home/hornigmm/nnUNet_Base_Folder/nnUNet/nnunetv2/training/nnUNetTrainer/nnUNetTrainer.py", line 1370, in run_training train_outputs.append(self.train_step(next(self.dataloader_train))) File "/mnt/hpccs01/home/hornigmm/nnUNet_Base_Folder/nnUNet/nnunetv2/training/nnUNetTrainer/nnUNetTrainer.py", line 994, in train_step output = self.network(data) File "/home/hornigmm/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl return self._call_impl(*args, kwargs) File "/home/hornigmm/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl return forward_call(*args, *kwargs) File "/home/hornigmm/.local/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 433, in _fn return fn(args, kwargs) File "/home/hornigmm/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl return self._call_impl(*args, kwargs) File "/home/hornigmm/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl return forward_call(*args, *kwargs) File "/home/hornigmm/.local/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py", line 1116, in call return self._torchdynamo_orig_callable( File "/home/hornigmm/.local/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py", line 948, in call result = self._inner_convert( File "/home/hornigmm/.local/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py", line 472, in call return _compile( File "/home/hornigmm/.local/lib/python3.10/site-packages/torch/_utils_internal.py", line 84, in wrapper_function return StrobelightCompileTimeProfiler.profile_compile_time( File "/home/hornigmm/.local/lib/python3.10/site-packages/torch/_strobelight/compile_time_profiler.py", line 129, in profile_compile_time return func(args, kwargs) File "/home/hornigmm/miniconda3/envs/nnunet_venv_2/lib/python3.10/contextlib.py", line 79, in inner return func(*args, kwds) File "/home/hornigmm/.local/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py", line 817, in _compile guarded_code = compile_inner(code, one_graph, hooks, transform) File "/home/hornigmm/.local/lib/python3.10/site-packages/torch/_dynamo/utils.py", line 231, in time_wrapper r = func(*args, *kwargs) File "/home/hornigmm/.local/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py", line 636, in compile_inner out_code = transform_code_object(code, transform) File "/home/hornigmm/.local/lib/python3.10/site-packages/torch/_dynamo/bytecode_transformation.py", line 1185, in transform_code_object transformations(instructions, code_options) File "/home/hornigmm/.local/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py", line 178, in _fn return fn(args, kwargs) File "/home/hornigmm/.local/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py", line 582, in transform tracer.run() File "/home/hornigmm/.local/lib/python3.10/site-packages/torch/_dynamo/symbolic_convert.py", line 2451, in run super().run() File "/home/hornigmm/.local/lib/python3.10/site-packages/torch/_dynamo/symbolic_convert.py", line 893, in run while self.step(): File "/home/hornigmm/.local/lib/python3.10/site-packages/torch/_dynamo/symbolic_convert.py", line 805, in step self.dispatch_table[inst.opcode](self, inst) File "/home/hornigmm/.local/lib/python3.10/site-packages/torch/_dynamo/symbolic_convert.py", line 2642, in RETURN_VALUE self._return(inst) File "/home/hornigmm/.local/lib/python3.10/site-packages/torch/_dynamo/symbolic_convert.py", line 2627, in _return self.output.compile_subgraph( File "/home/hornigmm/.local/lib/python3.10/site-packages/torch/_dynamo/output_graph.py", line 1123, in compile_subgraph self.compile_and_call_fx_graph(tx, pass2.graph_output_vars(), root) File "/home/hornigmm/miniconda3/envs/nnunet_venv_2/lib/python3.10/contextlib.py", line 79, in inner return func(*args, kwds) File "/home/hornigmm/.local/lib/python3.10/site-packages/torch/_dynamo/output_graph.py", line 1318, in compile_and_call_fx_graph compiled_fn = self.call_user_compiler(gm) File "/home/hornigmm/.local/lib/python3.10/site-packages/torch/_dynamo/utils.py", line 231, in time_wrapper r = func(*args, *kwargs) File "/home/hornigmm/.local/lib/python3.10/site-packages/torch/_dynamo/output_graph.py", line 1409, in call_user_compiler raise BackendCompilerFailed(self.compiler_fn, e).with_traceback( File "/home/hornigmm/.local/lib/python3.10/site-packages/torch/_dynamo/output_graph.py", line 1390, in call_user_compiler compiled_fn = compiler_fn(gm, self.example_inputs()) File "/home/hornigmm/.local/lib/python3.10/site-packages/torch/_dynamo/repro/after_dynamo.py", line 129, in call compiled_gm = compiler_fn(gm, example_inputs) File "/home/hornigmm/.local/lib/python3.10/site-packages/torch/init.py", line 1951, in call return compilefx(model, inputs_, config_patches=self.config) File "/home/hornigmm/miniconda3/envs/nnunet_venv_2/lib/python3.10/contextlib.py", line 79, in inner return func(args, kwds) File "/home/hornigmm/.local/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1505, in compile_fx return aot_autograd( File "/home/hornigmm/.local/lib/python3.10/site-packages/torch/_dynamo/backends/common.py", line 69, in call cg = aot_module_simplified(gm, example_inputs, self.kwargs) File "/home/hornigmm/.local/lib/python3.10/site-packages/torch/_functorch/aot_autograd.py", line 954, in aot_module_simplified compiledfn, = create_aot_dispatcher_function( File "/home/hornigmm/.local/lib/python3.10/site-packages/torch/_dynamo/utils.py", line 231, in time_wrapper r = func(*args, *kwargs) File "/home/hornigmm/.local/lib/python3.10/site-packages/torch/_functorch/aot_autograd.py", line 687, in create_aot_dispatcher_function compiled_fn, fw_metadata = compiler_fn( File "/home/hornigmm/.local/lib/python3.10/site-packages/torch/_functorch/_aot_autograd/jit_compile_runtime_wrappers.py", line 461, in aot_dispatch_autograd compiled_fw_func = aot_config.fw_compiler(fw_module, adjusted_flat_args) File "/home/hornigmm/.local/lib/python3.10/site-packages/torch/_dynamo/utils.py", line 231, in time_wrapper r = func(args, kwargs) File "/home/hornigmm/.local/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1410, in fw_compiler_base return inner_compile( File "/home/hornigmm/.local/lib/python3.10/site-packages/torch/_dynamo/repro/after_aot.py", line 84, in debug_wrapper inner_compiled_fn = compiler_fn(gm, example_inputs) File "/home/hornigmm/.local/lib/python3.10/site-packages/torch/_inductor/debug.py", line 304, in inner return fn(*args, kwargs) File "/home/hornigmm/miniconda3/envs/nnunet_venv_2/lib/python3.10/contextlib.py", line 79, in inner return func(*args, *kwds) File "/home/hornigmm/miniconda3/envs/nnunet_venv_2/lib/python3.10/contextlib.py", line 79, in inner return func(args, kwds) File "/home/hornigmm/.local/lib/python3.10/site-packages/torch/_dynamo/utils.py", line 231, in time_wrapper r = func(*args, kwargs) File "/home/hornigmm/.local/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 527, in compile_fx_inner compiled_graph = fx_codegen_and_compile( File "/home/hornigmm/miniconda3/envs/nnunet_venv_2/lib/python3.10/contextlib.py", line 79, in inner return func(*args, *kwds) File "/home/hornigmm/.local/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 831, in fx_codegen_and_compile compiled_fn = graph.compile_to_fn() File "/home/hornigmm/.local/lib/python3.10/site-packages/torch/_inductor/graph.py", line 1751, in compile_to_fn return self.compile_to_module().call File "/home/hornigmm/.local/lib/python3.10/site-packages/torch/_dynamo/utils.py", line 231, in time_wrapper r = func(args, kwargs) File "/home/hornigmm/.local/lib/python3.10/site-packages/torch/_inductor/graph.py", line 1680, in compile_to_module self.codegen_with_cpp_wrapper() if self.cpp_wrapper else self.codegen() File "/home/hornigmm/.local/lib/python3.10/site-packages/torch/_inductor/graph.py", line 1640, in codegen self.scheduler.codegen() File "/home/hornigmm/.local/lib/python3.10/site-packages/torch/_dynamo/utils.py", line 231, in time_wrapper r = func(*args, kwargs) File "/home/hornigmm/.local/lib/python3.10/site-packages/torch/_inductor/scheduler.py", line 2741, in codegen self.get_backend(device).codegen_node(node) File "/home/hornigmm/.local/lib/python3.10/site-packages/torch/_inductor/codegen/cuda_combined_scheduling.py", line 69, in codegen_node return self._triton_scheduling.codegen_node(node) File "/home/hornigmm/.local/lib/python3.10/site-packages/torch/_inductor/codegen/simd.py", line 1148, in codegen_node return self.codegen_node_schedule(node_schedule, buf_accesses, numel, rnumel) File "/home/hornigmm/.local/lib/python3.10/site-packages/torch/_inductor/codegen/simd.py", line 1317, in codegen_node_schedule src_code = kernel.codegen_kernel() File "/home/hornigmm/.local/lib/python3.10/site-packages/torch/_inductor/codegen/triton.py", line 2159, in codegen_kernel self.inductor_meta_common(), File "/home/hornigmm/.local/lib/python3.10/site-packages/torch/_inductor/codegen/triton.py", line 2047, in inductor_meta_common "backend_hash": torch.utils._triton.triton_hash_with_backend(), File "/home/hornigmm/.local/lib/python3.10/site-packages/torch/utils/_triton.py", line 63, in triton_hash_with_backend backend = triton_backend() File "/home/hornigmm/.local/lib/python3.10/site-packages/torch/utils/_triton.py", line 49, in triton_backend target = driver.active.get_current_target() File "/home/hornigmm/.local/lib/python3.10/site-packages/triton/runtime/driver.py", line 23, in getattr self._initialize_obj() File "/home/hornigmm/.local/lib/python3.10/site-packages/triton/runtime/driver.py", line 20, in _initialize_obj self._obj = self._init_fn() File "/home/hornigmm/.local/lib/python3.10/site-packages/triton/runtime/driver.py", line 9, in _create_driver return actives[0]() File "/home/hornigmm/.local/lib/python3.10/site-packages/triton/backends/nvidia/driver.py", line 371, in init self.utils = CudaUtils() # TODO: make static File "/home/hornigmm/.local/lib/python3.10/site-packages/triton/backends/nvidia/driver.py", line 80, in init mod = compile_module_from_src(Path(os.path.join(dirname, "driver.c")).read_text(), "cuda_utils") File "/home/hornigmm/.local/lib/python3.10/site-packages/triton/backends/nvidia/driver.py", line 57, in compile_module_from_src so = _build(name, src_path, tmpdir, library_dirs(), include_dir, libraries) File "/home/hornigmm/.local/lib/python3.10/site-packages/triton/runtime/build.py", line 48, in _build ret = subprocess.check_call(cc_cmd) File "/home/hornigmm/miniconda3/envs/nnunet_venv_2/lib/python3.10/subprocess.py", line 369, in check_call raise CalledProcessError(retcode, cmd) torch._dynamo.exc.BackendCompilerFailed: backend='inductor' raised: CalledProcessError: Command '['/usr/bin/gcc', '/data1/pbs.10494714.pbs/tmpjdkfhlyo/main.c', '-O3', '-shared', '-fPIC', '-o', '/data1/pbs.10494714.pbs/tmpjdkfhlyo/cuda_utils.cpython-310-x86_64-linux-gnu.so', '-lcuda', '-L/mnt/hpccs01/home/hornigmm/.local/lib/python3.10/site-packages/triton/backends/nvidia/lib', '-L/usr/lib64', '-L/usr/lib', '-I/mnt/hpccs01/home/hornigmm/.local/lib/python3.10/site-packages/triton/backends/nvidia/include', '-I/data1/pbs.10494714.pbs/tmpjdkfhlyo', '-I/home/hornigmm/miniconda3/envs/nnunet_venv_2/include/python3.10']' returned non-zero exit status 1.

Set TORCH_LOGS="+dynamo" and TORCHDYNAMO_VERBOSE=1 for more information

You can suppress this exception and fall back to eager by setting: import torch._dynamo torch._dynamo.config.suppress_errors = True

Exception in thread Thread-1 (results_loop): Traceback (most recent call last): File "/home/hornigmm/miniconda3/envs/nnunet_venv_2/lib/python3.10/threading.py", line 1016, in _bootstrap_inner self.run() File "/home/hornigmm/miniconda3/envs/nnunet_venv_2/lib/python3.10/threading.py", line 953, in run self._target(*self._args, **self._kwargs) File "/home/hornigmm/.local/lib/python3.10/site-packages/batchgenerators/dataloading/nondet_multi_threaded_augmenter.py", line 125, in results_loop raise e File "/home/hornigmm/.local/lib/python3.10/site-packages/batchgenerators/dataloading/nondet_multi_threaded_augmenter.py", line 103, in results_loop raise RuntimeError("One or more background workers are no longer alive. Exiting. Please check the " RuntimeError: One or more background workers are no longer alive. Exiting. Please check the print statements above for the actual error message PBS Job 10494714.pbs CPU time : 00:27:53 Wall time : 00:05:47 Mem usage : 133554872kb

`