I am constantly getting runtime error whenever the validation step reaches. I have tried to run it in google colab as well as in my local computer with nvidia rtx3090. what could be the reason? any idea? thank you.
Detailed description
Val 0/4 0/250 , dice_tc: 1.0353405 , dice_wt: 0.96759117 , dice_et: 1.3043995 , time 6.37s
Val 0/4 1/250 , dice_tc: 1.0755885 , dice_wt: 1.1260056 , dice_et: 1.3453007 , time 3.67s
Error executing job with overrides: []
Traceback (most recent call last):
File "/home/navi/Brats-20-Tumors-segmentation/train.py", line 478, in main
run(args, model=model,
File "/home/navi/Brats-20-Tumors-segmentation/train.py", line 358, in run
) = trainer(
File "/home/navi/Brats-20-Tumors-segmentation/train.py", line 249, in trainer
val_acc = val(model= model,
File "/home/navi/Brats-20-Tumors-segmentation/train.py", line 138, in val
for index, batch_data in enumerate(loader):
File "/home/navi/miniconda3/envs/myenv/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 631, in next
data = self._next_data()
File "/home/navi/miniconda3/envs/myenv/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 1346, in _next_data
return self._process_data(data)
File "/home/navi/miniconda3/envs/myenv/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 1372, in _process_data
data.reraise()
File "/home/navi/miniconda3/envs/myenv/lib/python3.10/site-packages/torch/_utils.py", line 722, in reraise
raise exception
RuntimeError: Caught RuntimeError in pin memory thread for device 0.
Original Traceback (most recent call last):
File "/home/navi/miniconda3/envs/myenv/lib/python3.10/site-packages/torch/utils/data/_utils/pin_memory.py", line 36, in do_one_step
data = pin_memory(data, device)
File "/home/navi/miniconda3/envs/myenv/lib/python3.10/site-packages/torch/utils/data/_utils/pin_memory.py", line 62, in pin_memory
return type(data)({k: pin_memory(sample, device) for k, sample in data.items()}) # type: ignore[call-arg]
File "/home/navi/miniconda3/envs/myenv/lib/python3.10/site-packages/torch/utils/data/_utils/pin_memory.py", line 62, in
return type(data)({k: pin_memory(sample, device) for k, sample in data.items()}) # type: ignore[call-arg]
File "/home/navi/miniconda3/envs/myenv/lib/python3.10/site-packages/torch/utils/data/_utils/pin_memory.py", line 57, in pin_memory
return data.pin_memory(device)
File "/home/navi/MONAI/monai/data/meta_tensor.py", line 282, in torch_function
ret = super().torch_function(func, types, args, kwargs)
File "/home/navi/miniconda3/envs/myenv/lib/python3.10/site-packages/torch/_tensor.py", line 1418, in __torch_function__
ret = func(*args, **kwargs)
RuntimeError: CUDA error: out of memory
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.
Steps to reproduce
Set all the directories correctly in the configuration file.
I am constantly getting runtime error whenever the validation step reaches. I have tried to run it in google colab as well as in my local computer with nvidia rtx3090. what could be the reason? any idea? thank you.
Detailed description Val 0/4 0/250 , dice_tc: 1.0353405 , dice_wt: 0.96759117 , dice_et: 1.3043995 , time 6.37s Val 0/4 1/250 , dice_tc: 1.0755885 , dice_wt: 1.1260056 , dice_et: 1.3453007 , time 3.67s Error executing job with overrides: [] Traceback (most recent call last): File "/home/navi/Brats-20-Tumors-segmentation/train.py", line 478, in main run(args, model=model, File "/home/navi/Brats-20-Tumors-segmentation/train.py", line 358, in run ) = trainer( File "/home/navi/Brats-20-Tumors-segmentation/train.py", line 249, in trainer val_acc = val(model= model, File "/home/navi/Brats-20-Tumors-segmentation/train.py", line 138, in val for index, batch_data in enumerate(loader): File "/home/navi/miniconda3/envs/myenv/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 631, in next data = self._next_data() File "/home/navi/miniconda3/envs/myenv/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 1346, in _next_data return self._process_data(data) File "/home/navi/miniconda3/envs/myenv/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 1372, in _process_data data.reraise() File "/home/navi/miniconda3/envs/myenv/lib/python3.10/site-packages/torch/_utils.py", line 722, in reraise raise exception RuntimeError: Caught RuntimeError in pin memory thread for device 0. Original Traceback (most recent call last): File "/home/navi/miniconda3/envs/myenv/lib/python3.10/site-packages/torch/utils/data/_utils/pin_memory.py", line 36, in do_one_step data = pin_memory(data, device) File "/home/navi/miniconda3/envs/myenv/lib/python3.10/site-packages/torch/utils/data/_utils/pin_memory.py", line 62, in pin_memory return type(data)({k: pin_memory(sample, device) for k, sample in data.items()}) # type: ignore[call-arg] File "/home/navi/miniconda3/envs/myenv/lib/python3.10/site-packages/torch/utils/data/_utils/pin_memory.py", line 62, in
return type(data)({k: pin_memory(sample, device) for k, sample in data.items()}) # type: ignore[call-arg]
File "/home/navi/miniconda3/envs/myenv/lib/python3.10/site-packages/torch/utils/data/_utils/pin_memory.py", line 57, in pin_memory
return data.pin_memory(device)
File "/home/navi/MONAI/monai/data/meta_tensor.py", line 282, in torch_function
ret = super().torch_function(func, types, args, kwargs)
File "/home/navi/miniconda3/envs/myenv/lib/python3.10/site-packages/torch/_tensor.py", line 1418, in __torch_function__
ret = func(*args, **kwargs)
RuntimeError: CUDA error: out of memory
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with
TORCH_USE_CUDA_DSA
to enable device-side assertions.Steps to reproduce