faizan1234567 / BraTS23-Tumors-Segmentation

Brain tumors segmentation on 3D MRI images. The model has been trained on BratTS20 and BraTS21 datasets, and now working with BraTS23.
MIT License
31 stars 2 forks source link

RuntimeError: CUDA error: out of memory #6

Closed Navee402 closed 2 weeks ago

Navee402 commented 5 months ago

I am constantly getting runtime error whenever the validation step reaches. I have tried to run it in google colab as well as in my local computer with nvidia rtx3090. what could be the reason? any idea? thank you.

Detailed description Val 0/4 0/250 , dice_tc: 1.0353405 , dice_wt: 0.96759117 , dice_et: 1.3043995 , time 6.37s Val 0/4 1/250 , dice_tc: 1.0755885 , dice_wt: 1.1260056 , dice_et: 1.3453007 , time 3.67s Error executing job with overrides: [] Traceback (most recent call last): File "/home/navi/Brats-20-Tumors-segmentation/train.py", line 478, in main run(args, model=model, File "/home/navi/Brats-20-Tumors-segmentation/train.py", line 358, in run ) = trainer( File "/home/navi/Brats-20-Tumors-segmentation/train.py", line 249, in trainer val_acc = val(model= model, File "/home/navi/Brats-20-Tumors-segmentation/train.py", line 138, in val for index, batch_data in enumerate(loader): File "/home/navi/miniconda3/envs/myenv/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 631, in next data = self._next_data() File "/home/navi/miniconda3/envs/myenv/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 1346, in _next_data return self._process_data(data) File "/home/navi/miniconda3/envs/myenv/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 1372, in _process_data data.reraise() File "/home/navi/miniconda3/envs/myenv/lib/python3.10/site-packages/torch/_utils.py", line 722, in reraise raise exception RuntimeError: Caught RuntimeError in pin memory thread for device 0. Original Traceback (most recent call last): File "/home/navi/miniconda3/envs/myenv/lib/python3.10/site-packages/torch/utils/data/_utils/pin_memory.py", line 36, in do_one_step data = pin_memory(data, device) File "/home/navi/miniconda3/envs/myenv/lib/python3.10/site-packages/torch/utils/data/_utils/pin_memory.py", line 62, in pin_memory return type(data)({k: pin_memory(sample, device) for k, sample in data.items()}) # type: ignore[call-arg] File "/home/navi/miniconda3/envs/myenv/lib/python3.10/site-packages/torch/utils/data/_utils/pin_memory.py", line 62, in return type(data)({k: pin_memory(sample, device) for k, sample in data.items()}) # type: ignore[call-arg] File "/home/navi/miniconda3/envs/myenv/lib/python3.10/site-packages/torch/utils/data/_utils/pin_memory.py", line 57, in pin_memory return data.pin_memory(device) File "/home/navi/MONAI/monai/data/meta_tensor.py", line 282, in torch_function ret = super().torch_function(func, types, args, kwargs) File "/home/navi/miniconda3/envs/myenv/lib/python3.10/site-packages/torch/_tensor.py", line 1418, in __torch_function__ ret = func(*args, **kwargs) RuntimeError: CUDA error: out of memory CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1. Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.

Steps to reproduce

  1. Set all the directories correctly in the configuration file.
  2. Run python train.py
faizan1234567 commented 5 months ago

please reduce batch size it seems like you don't have an enough GPU memory.