Mgithus commented 1 year ago

I am doing tumor segmentation task on image dataset BraTS 2021, using the Swin UNETR model. For now, I am using just 5 samples. Images are 4 modalities, 3D (240x240x155). Trying to run following code without any changes: [https://github.com/Project-MONAI/tutorials/blob/main/3d_segmentation/swin_unetr_brats21_segmentation_3d.ipynb]

I am trying to run it on a GPU having the following properties: OS Name: Ubuntu 20.04.6 LTS Processor: Intel® Xeon(R) CPU E5504 @ 2.00GHz × 4 Graphics: NVIDIA Corporation GP102 [GeForce GTX 1080 Ti] 11 GB RAM

First, I was getting following error, with CUDA version:12.2, torch version:1.7+cu110 : RuntimeError: CUDA error: an illegal instruction was encountered CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1. Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.

Then I reinstalled CUDA version 11.0, shown by: nvcc --version: (base) dlrs@spml3:~$ nvcc --version nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2020 NVIDIA Corporation Built on Wed_Jul_22_19:09:09_PDT_2020 Cuda compilation tools, release 11.0, V11.0.221 Build cuda_11.0_bu.TC445_37.28845127_0

+---------------------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=======================================================================================| | 0 N/A N/A 803 G /usr/lib/xorg/Xorg 128MiB | | 0 N/A N/A 1106 G /usr/bin/gnome-shell 20MiB | | 0 N/A N/A 1171 G /opt/teamviewer/tv_bin/TeamViewer 2MiB | | 0 N/A N/A 2364 G /usr/lib/firefox/firefox 13MiB | | 0 N/A N/A 4646 G ...sion,SpareRendererForSitePerProcess 35MiB | | 0 N/A N/A 15064 G ...959815738,826468481227333041,262144 31MiB | | 0 N/A N/A 55236 G gnome-control-center 2MiB | +---------------------------------------------------------------------------------------+

Along with: torch version: 1.7.1+cu110 python==3.8 monai==1.00 (I run it by just adding paths to my dataset), I got following error:

RuntimeError: cuDNN error: CUDNN_STATUS_INTERNAL_ERROR You can try to repro this exception using the following code snippet. If that doesn't trigger the error, please include your original repro script when reporting this issue.

import torch torch.backends.cuda.matmul.allow_tf32 = True torch.backends.cudnn.benchmark = True torch.backends.cudnn.deterministic = False torch.backends.cudnn.allow_tf32 = True data = torch.randn([2, 48, 128, 128, 128], dtype=torch.float, device='cuda', requires_grad=True) net = torch.nn.Conv3d(48, 48, kernel_size=[3, 3, 3], padding=[1, 1, 1], stride=[1, 1, 1], dilation=[1, 1, 1], groups=1) net = net.cuda().float() out = net(data) out.backward(torch.randn_like(out)) torch.cuda.synchronize()

ConvolutionParams data_type = CUDNN_DATA_FLOAT padding = [1, 1, 1] stride = [1, 1, 1] dilation = [1, 1, 1] groups = 1 deterministic = false allow_tf32 = true input: TensorDescriptor 0x6c7e490 type = CUDNN_DATA_FLOAT nbDims = 5 dimA = 2, 48, 128, 128, 128, strideA = 100663296, 2097152, 16384, 128, 1, output: TensorDescriptor 0xa210820 type = CUDNN_DATA_FLOAT nbDims = 5 dimA = 2, 48, 128, 128, 128, strideA = 100663296, 2097152, 16384, 128, 1, weight: FilterDescriptor 0x6a8c150 type = CUDNN_DATA_FLOAT tensor_format = CUDNN_TENSOR_NCHW nbDims = 5 dimA = 48, 48, 3, 3, 3, Pointer addresses: input: 0x7fbb38000000 output: 0x7fbb68000000 weight: 0x7fbd159a9600

With changes in the model as follows: roi = (64, 64, 64) (from(128,128,128)) batch_size = 1 (from 2 to 1) sw_batch_size = 1 (from 4 to 1) fold = 1 infer_overlap = 0.5 max_epochs = 4 (from 100 to 4) val_every = 2 (from 10 to 2) When I ran above code, it became unresponsive

Then I cleared cache by (torch.cuda.empty_cache()), and run the script again followinfg error was encountered:

/tmp/tmp7q0y5gok Fri Aug 11 07:06:10 2023 Epoch: 0 Epoch 0/4 0/4 loss: 0.9950 time 11.59s Epoch 0/4 1/4 loss: 0.9968 time 0.43s Epoch 0/4 2/4 loss: 0.9979 time 0.43s Epoch 0/4 3/4 loss: 0.9984 time 0.43s Final training 0/3 loss: 0.9984 time 13.05s None of the inputs have requires_grad=True. Gradients will be None Traceback (most recent call last): File "notebook_of_swin_unetr.py", line 429, in ) = trainer( File "notebook_of_swin_unetr.py", line 363, in trainer val_acc = val_epoch( File "notebook_of_swin_unetr.py", line 289, in val_epoch logits = model_inferer(data) File "/home/dlrs/.local/lib/python3.8/site-packages/monai/inferers/utils.py", line 180, in sliding_window_inference seg_prob_out = predictor(window_data, *args, kwargs) # batched patch segmentation File "/home/dlrs/.local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl result = self.forward(*input, *kwargs) File "/home/dlrs/.local/lib/python3.8/site-packages/monai/networks/nets/swin_unetr.py", line 297, in forward hidden_states_out = self.swinViT(x_in, self.normalize) File "/home/dlrs/.local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl result = self.forward(input, kwargs) File "/home/dlrs/.local/lib/python3.8/site-packages/monai/networks/nets/swin_unetr.py", line 1017, in forward x4 = self.layers4[0](x3.contiguous()) File "/home/dlrs/.local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl result = self.forward(*input, kwargs) File "/home/dlrs/.local/lib/python3.8/site-packages/monai/networks/nets/swin_unetr.py", line 874, in forward attn_mask = compute_mask([dp, hp, wp], window_size, shift_size, x.device) File "/home/dlrs/.local/lib/python3.8/site-packages/monai/networks/nets/swin_unetr.py", line 779, in compute_mask img_mask[:, d, h, w, :] = cnt RuntimeError: CUDA error: an illegal instruction was encountered**

and again it became unresponsive. I restarted terminal :

903 for the detail.

Printing MONAI config...

MONAI version: 1.0.0 Numpy version: 1.21.6 Pytorch version: 1.7.1+cu110 MONAI flags: HAS_EXT = False, USE_COMPILED = False, USE_META_DICT = False MONAI rev id: 170093375ce29267e45681fcec09dfa856e1d7e7 MONAI file: /home/dlrs/.local/lib/python3.8/site-packages/monai/init.py

Optional dependencies: Pytorch Ignite version: 0.4.8 Nibabel version: 5.1.0 scikit-image version: 0.21.0 Pillow version: 10.0.0 Tensorboard version: 2.14.0 gdown version: 4.7.1 TorchVision version: 0.8.2+cu110 tqdm version: 4.66.1 lmdb version: 1.4.1 psutil version: 5.9.5 pandas version: 2.0.3 einops version: 0.6.1 transformers version: 4.31.0 mlflow version: 2.5.0 pynrrd version: 1.0.0

For details about installing the optional dependencies, please visit: https://docs.monai.io/en/latest/installation.html#installing-the-recommended-dependencies

================================ Printing system config...

System: Linux Linux version: Ubuntu 20.04.6 LTS Platform: Linux-5.15.0-78-generic-x86_64-with-glibc2.17 Processor: x86_64 Machine: x86_64 Python version: 3.8.17 Process name: python Command: ['python', '-c', 'import monai; monai.config.print_debug_info()'] Open files: [popenfile(path='/home/dlrs/.anaconda/navigator/Code/logs/20230811T061228/ptyhost.log', fd=39, position=0, mode='a', flags=33793), popenfile(path='/snap/code/136/usr/share/code/resources/app/node_modules.asar', fd=41, position=64064, mode='r', flags=32768), popenfile(path='/snap/code/136/usr/share/code/v8_context_snapshot.bin', fd=103, position=0, mode='r', flags=32768)] Num physical CPUs: 4 Num logical CPUs: 4 Num usable CPUs: 4 CPU usage (%): [100.0, 30.4, 35.8, 78.6] CPU freq. (MHz): 1995 Load avg. in last 1, 5, 15 mins (%): [54.5, 78.2, 69.5] Disk usage (%): 41.9 Avg. sensor temp. (Celsius): UNKNOWN for given OS Total physical memory (GB): 15.6 Available memory (GB): 6.6 Used memory (GB): 8.3

================================ Printing GPU config...

Num GPUs: 1 Has CUDA: True CUDA version: 11.0 cuDNN enabled: True cuDNN version: 8005 Current device: 0 Library compiled for CUDA architectures: ['sm_37', 'sm_50', 'sm_60', 'sm_70', 'sm_75', 'sm_80'] GPU 0 Name: NVIDIA GeForce GTX 1080 Ti GPU 0 Is integrated: False GPU 0 Is multi GPU board: False GPU 0 Multi processor count: 28 GPU 0 Total memory (GB): 10.9 GPU 0 CUDA capability (maj.min): 6.1

KumoLiu commented 1 year ago

Hi @Mgithus,

(I run it by just adding paths to my dataset), I got following error RuntimeError: cuDNN error: CUDNN_STATUS_INTERNAL_ERROR

There are several factors that could cause this error. Could you please try the tutorial without changing the data path and see if it can work properly? If that works, could you please check your data first and see if they have all the proper shapes and labels? Thanks!

Mgithus commented 1 year ago

Hi @Mgithus,

(I run it by just adding paths to my dataset), I got following error RuntimeError: cuDNN error: CUDNN_STATUS_INTERNAL_ERROR

There are several factors that could cause this error. Could you please try the tutorial without changing the data path and see if it can work properly? If that works, could you please check your data first and see if they have all the proper shapes and labels? Thanks!

Sir, BraTS Dataset can't be accessed directly on google colab. We have to download it from kaggle or synapse (after permission in case of 2023 dataset). So its necessary to provide and modify dataset path manually.

I can't download whole data on my google drive, because it requires 26.5 GB to just save it to drive. So i cant run this code on whole data on google colab either. The dataset info given on colab notebook of this code on monai website (https://github.com/Project-MONAI/tutorials/blob/main/3d_segmentation/swin_unetr_brats21_segmentation_3d.ipynb) is as follows:

Modality: MRI Size: 1470 3D volumes (1251 Training + 219 Validation) In 1251 training samples each has 4 3D modalities and 1 3D segmentation mask in it.(1251*5 = 6255 total images)

image shape: (240, 240, 155) label shape: (240, 240, 155)

vfdev-5 commented 1 year ago

@Mgithus the error means that here it goes out of bounds :

File "/home/dlrs/.local/lib/python3.8/site-packages/monai/networks/nets/swin_unetr.py", line 779, in compute_mask
img_mask[:, d, h, w, :] = cnt
RuntimeError: CUDA error: an illegal instruction was encountered

You can have more comprehensive error if you run everything on CPU, the error may be more explicit like N is out of bounds.

Project-MONAI / MONAI

RuntimeError: cuDNN error: CUDNN_STATUS_INTERNAL_ERROR, RuntimeError: CUDA error: an illegal instruction was encountered #6858

Printing MONAI config...

================================ Printing system config...

================================ Printing GPU config...