RuntimeError: cuDNN error: CUDNN_STATUS_MAPPING_ERROR

zxy630 commented 11 months ago

I try to use ‘./deeplesion/eval.sh ./deeplesion/mconfigs/densenet_a3d.py ./deeplesion/model_weights/adap_7slice_weigts.pth’ but I get this wrong information. It's been bothering me for days......

Here is the info ''' ./deeplesion/mconfigs/densenet_a3d.py a3d 7 slice [ ] 0/160, elapsed: 0s, ETA:Traceback (most recent call last): File "./deeplesion/eval.py", line 210, in main(checkpoint, cfg_path) File "./deeplesion/eval.py", line 196, in main outputs = single_gpu_test(model, dl) File "./deeplesion/eval.py", line 101, in single_gpu_test r = model(return_loss=False, rescale=False, data)
File "/disk/user/zxy/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 541, in call result = self.forward(*input, *kwargs) File "/disk/user/zxy/anaconda3/lib/python3.7/site-packages/torch/nn/parallel/data_parallel.py", line 150, in forward return self.module(inputs[0], kwargs[0]) File "/disk/user/zxy/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 541, in call result = self.forward(*input, kwargs) File "/disk/user/zxy/project/AlignShift/mmdet/core/fp16/decorators.py", line 49, in new_func return old_func(*args, kwargs) File "/disk/user/zxy/project/AlignShift/mmdet/models/detectors/base.py", line 122, in forward return self.forward_test(img, img_meta, kwargs) File "/disk/user/zxy/project/AlignShift/mmdet/models/detectors/base.py", line 105, in forward_test return self.simple_test(imgs, img_metas, *kwargs) File "/disk/user/zxy/project/AlignShift/mmdet/models/detectors/two_stage.py", line 268, in simple_test x = self.extract_feat(img) File "/disk/user/zxy/project/AlignShift/mmdet/models/detectors/two_stage.py", line 92, in extract_feat x = self.backbone(img) File "/disk/user/zxy/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 541, in call result = self.forward(input, kwargs) File "/disk/user/zxy/project/AlignShift/nn/models/truncated_densenet3d_a3d.py", line 168, in forward x = self.conv0(x) File "/disk/user/zxy/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 541, in call result = self.forward(*input, **kwargs) File "/disk/user/zxy/project/AlignShift/nn/operators/a3dconv.py", line 59, in forward self.padding, self.dilation, self.groups) RuntimeError: cuDNN error: CUDNN_STATUS_MAPPING_ERROR '''

Hope your suggestions, thanks so much.

zxy630 commented 11 months ago

My environment: PyTorch=1.3.1, torchvision=0.4.2, cuda=10.1.243, test in 3090 with 4 GPUs.

NotF404 commented 10 months ago

Hi, It seems a cuda error, maybe caused by corrupted pytorch environment. you can try run a single conv module to check if the environment is in good condition. Reinstall pytorch may solve this, if thats the case.

zxy630 commented 10 months ago

I have tried torch=1.3.1, 1.5.0, 1.7.1, 1.8.0 and still existed problems like this case. I wonder which version you test, incluing torch, CUDA, GPU if convenient. Thanks.

NotF404 commented 10 months ago

The traceback you provided shows that torch cant run conv module sucessfully. So try run single conv module to see if torch works, just like this: import torch conv = torch.nn.Conv2d(4, 16, 3).cuda() x = torch.rand(2, 4, 128, 128) .cuda()# B,C,W,H y = conv(x)

zxy630 commented 10 months ago

Excuse. eval is well done, but when i train, it happened error. ''' Traceback (most recent call last): File "./deeplesion/train_dist.py", line 121, in main(args) File "./deeplesion/train_dist.py", line 116, in main logger=logger) File "/home/zhangyi/workplace/AlignShiftv2/mmdet/apis/train.py", line 68, in train_detector _dist_train(model, dataset, cfg, validate=validate) File "/home/zhangyi/workplace/AlignShiftv2/mmdet/apis/train.py", line 204, in _dist_train runner.run(data_loaders, cfg.workflow, cfg.total_epochs) File "/home/zhangyi/anaconda3/envs/a3d/lib/python3.6/site-packages/mmcv/runner/runner.py", line 358, in run epoch_runner(data_loaders[i], kwargs) File "/home/zhangyi/anaconda3/envs/a3d/lib/python3.6/site-packages/mmcv/runner/runner.py", line 260, in train for i, data_batch in enumerate(data_loader): File "/home/zhangyi/anaconda3/envs/a3d/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 346, in next data = self._dataset_fetcher.fetch(index) # may raise StopIteration File "/home/zhangyi/anaconda3/envs/a3d/lib/python3.6/site-packages/torch/utils/data/_utils/fetch.py", line 44, in fetch data = [self.dataset[idx] for idx in possibly_batched_index] File "/home/zhangyi/anaconda3/envs/a3d/lib/python3.6/site-packages/torch/utils/data/_utils/fetch.py", line 44, in data = [self.dataset[idx] for idx in possibly_batched_index] File "/home/zhangyi/workplace/AlignShiftv2/deeplesion/dataset/DeepLesionDataset_a3d.py", line 110, in getitem results = self.pre_pipeline(results) File "/home/zhangyi/workplace/AlignShiftv2/mmdet/datasets/pipelines/compose.py", line 24, in call data1 = t(data) File "/home/zhangyi/workplace/AlignShiftv2/mmdet/datasets/pipelines/transforms.py", line 817, in call results = self.aug(results) File "/home/zhangyi/anaconda3/envs/a3d/lib/python3.6/site-packages/albumentations/core/composition.py", line 158, in call data = t(force_apply=force_apply, data) File "/home/zhangyi/anaconda3/envs/a3d/lib/python3.6/site-packages/albumentations/core/transforms_interface.py", line 65, in call res[key] = target_function(arg, dict(params, *target_dependencies)) File "/home/zhangyi/anaconda3/envs/a3d/lib/python3.6/site-packages/albumentations/augmentations/transforms.py", line 513, in apply return F.shift_scale_rotate(img, angle, scale, dx, dy, interpolation, self.border_mode, self.value) File "/home/zhangyi/anaconda3/envs/a3d/lib/python3.6/site-packages/albumentations/augmentations/functional.py", line 58, in wrapped_function result = func(img, args, **kwargs) File "/home/zhangyi/anaconda3/envs/a3d/lib/python3.6/site-packages/albumentations/augmentations/functional.py", line 168, in shift_scale_rotate img = cv2.warpAffine(img, matrix, (width, height), flags=interpolation, borderMode=border_mode, borderValue=value) cv2.error: OpenCV(4.1.0) /io/opencv/modules/imgproc/src/imgwarp.cpp:2597: error: (-215:Assertion failed) _src.channels() <= 4 || (interpolation != INTER_LANCZOS4 && interpolation != INTER_CUBIC) in function 'warpAffine' '''

I have tried a lot of cv versions but doesn't work. Can you give me some tips?

NotF404 commented 10 months ago

Checking the albumentations version, and using compatible opencv.

M3DV / AlignShift

RuntimeError: cuDNN error: CUDNN_STATUS_MAPPING_ERROR #8