facebookresearch / maskrcnn-benchmark

Fast, modular reference implementation of Instance Segmentation and Object Detection algorithms in PyTorch.
MIT License
9.31k stars 2.49k forks source link

RuntimeError: cuDNN error: CUDNN_STATUS_EXECUTION_FAILED #1229

Open sadik1111 opened 4 years ago

sadik1111 commented 4 years ago

❓ Questions and Help

File "tools/train_net.py", line 189, in <module> main() File "tools/train_net.py", line 182, in main model = train(cfg, args.local_rank, args.distributed) File "tools/train_net.py", line 88, in train arguments, File "/home/sadik/CenterMask/maskrcnn_benchmark/engine/trainer.py", line 83, in do_train loss_dict = model(images, targets) File "/home/sadik/.conda/envs/centermask/lib/python3.7/site-packages/torch/nn/modules/module.py", line 493, in __call__ result = self.forward(*input, **kwargs) File "/home/sadik/.conda/envs/centermask/lib/python3.7/site-packages/torch/nn/parallel/distributed.py", line 376, in forward output = self.module(*inputs[0], **kwargs[0]) File "/home/sadik/.conda/envs/centermask/lib/python3.7/site-packages/torch/nn/modules/module.py", line 493, in __call__ result = self.forward(*input, **kwargs) File "/home/sadik/CenterMask/maskrcnn_benchmark/modeling/detector/generalized_rcnn.py", line 63, in forward x, result, detector_losses = self.roi_heads(features, proposals, targets) File "/home/sadik/.conda/envs/centermask/lib/python3.7/site-packages/torch/nn/modules/module.py", line 493, in __call__ result = self.forward(*input, **kwargs) File "/home/sadik/CenterMask/maskrcnn_benchmark/modeling/roi_heads/roi_heads.py", line 66, in forward x, detections, loss_mask, roi_feature, selected_mask, labels, maskiou_targets = self.mask(mask_features, proposals, targets) File "/home/sadik/.conda/envs/centermask/lib/python3.7/site-packages/torch/nn/modules/module.py", line 493, in __call__ result = self.forward(*input, **kwargs) File "/home/sadik/CenterMask/maskrcnn_benchmark/modeling/roi_heads/mask_head/mask_head.py", line 75, in forward x, roi_feature = self.feature_extractor(features, proposals) File "/home/sadik/.conda/envs/centermask/lib/python3.7/site-packages/torch/nn/modules/module.py", line 493, in __call__ result = self.forward(*input, **kwargs) File "/home/sadik/CenterMask/maskrcnn_benchmark/modeling/roi_heads/mask_head/roi_mask_feature_extractors.py", line 131, in forward x = F.relu(getattr(self, layer_name)(x)) File "/home/sadik/.conda/envs/centermask/lib/python3.7/site-packages/torch/nn/modules/module.py", line 493, in __call__ result = self.forward(*input, **kwargs) File "/home/sadik/CenterMask/maskrcnn_benchmark/layers/misc.py", line 33, in forward return super(Conv2d, self).forward(x) File "/home/sadik/.conda/envs/centermask/lib/python3.7/site-packages/torch/nn/modules/conv.py", line 338, in forward self.padding, self.dilation, self.groups) RuntimeError: cuDNN error: CUDNN_STATUS_EXECUTION_FAILED

Environment

GPU: 2 Tesla T4 drivers:nvidia-410 cuda:9.0 cudnn:7.4

Versions of relevant libraries:

pytorch: 1.1.0-py3.7_cuda9.0.176_cudnn7.5.1_0 pytorch

Thank you for any help!

kaiguoscut commented 4 years ago

I have the same problem as you. python webcam.py --config-file "../experiements/cfgs/e2e_faster_rcnn_R_50_C4_1x.yaml" Traceback (most recent call last): File "webcam.py", line 84, in main() File "webcam.py", line 78, in main composite = coco_demo.run_on_opencv_image(img) File "/home/guo/maskrcnn-benchmark/demo/predictor.py", line 209, in run_on_opencv_image predictions = self.compute_prediction(image) File "/home/guo/maskrcnn-benchmark/demo/predictor.py", line 242, in compute_prediction predictions = self.model(image_list) File "/home/guo/anaconda3/envs/maskrcnn/lib/python3.6/site-packages/torch/nn/modules/module.py", line 491, in call result = self.forward(*input, kwargs) File "/home/guo/maskrcnn-benchmark/maskrcnn_benchmark/modeling/detector/generalized_rcnn.py", line 52, in forward x, result, detector_losses = self.roi_heads(features, proposals, targets) File "/home/guo/anaconda3/envs/maskrcnn/lib/python3.6/site-packages/torch/nn/modules/module.py", line 491, in call result = self.forward(*input, *kwargs) File "/home/guo/maskrcnn-benchmark/maskrcnn_benchmark/modeling/roi_heads/roi_heads.py", line 26, in forward x, detections, loss_box = self.box(features, proposals, targets) File "/home/guo/anaconda3/envs/maskrcnn/lib/python3.6/site-packages/torch/nn/modules/module.py", line 491, in call result = self.forward(input, kwargs) File "/home/guo/maskrcnn-benchmark/maskrcnn_benchmark/modeling/roi_heads/box_head/box_head.py", line 47, in forward x = self.feature_extractor(features, proposals) File "/home/guo/anaconda3/envs/maskrcnn/lib/python3.6/site-packages/torch/nn/modules/module.py", line 491, in call result = self.forward(*input, kwargs) File "/home/guo/maskrcnn-benchmark/maskrcnn_benchmark/modeling/roi_heads/box_head/roi_box_feature_extractors.py", line 45, in forward x = self.head(x) File "/home/guo/anaconda3/envs/maskrcnn/lib/python3.6/site-packages/torch/nn/modules/module.py", line 491, in call result = self.forward(*input, *kwargs) File "/home/guo/maskrcnn-benchmark/maskrcnn_benchmark/modeling/backbone/resnet.py", line 203, in forward x = getattr(self, stage)(x) File "/home/guo/anaconda3/envs/maskrcnn/lib/python3.6/site-packages/torch/nn/modules/module.py", line 491, in call result = self.forward(input, kwargs) File "/home/guo/anaconda3/envs/maskrcnn/lib/python3.6/site-packages/torch/nn/modules/container.py", line 97, in forward input = module(input) File "/home/guo/anaconda3/envs/maskrcnn/lib/python3.6/site-packages/torch/nn/modules/module.py", line 491, in call result = self.forward(*input, *kwargs) File "/home/guo/maskrcnn-benchmark/maskrcnn_benchmark/modeling/backbone/resnet.py", line 331, in forward out = self.conv2(out) File "/home/guo/anaconda3/envs/maskrcnn/lib/python3.6/site-packages/torch/nn/modules/module.py", line 491, in call result = self.forward(input, kwargs) File "/home/guo/maskrcnn-benchmark/maskrcnn_benchmark/layers/misc.py", line 33, in forward return super(Conv2d, self).forward(x) File "/home/guo/anaconda3/envs/maskrcnn/lib/python3.6/site-packages/torch/nn/modules/conv.py", line 339, in forward self.padding, self.dilation, self.groups) RuntimeError: cuDNN error: CUDNN_STATUS_EXECUTION_FAILED And, when i check cudnn version, can't not find cudnn.h $ cat /usr/local/cuda/include/cudnn.h | grep CUDNN_MAJOR -A 2 cat: /usr/local/cuda/include/cudnn.h: No such file or directory environment GPU: RTX 2080 drivers : 450.66 cuda: 9.0 cudnn: 7.5.1 Versions of relevant libraries** pytorch 1.1.0 py3.6_cuda9.0.176_cudnn7.5.1_0 pytorch pytorch-nightly 1.0.0.dev20190328 py3.6_cuda9.0.176_cudnn7.4.2_0 pytorch

igygi commented 4 years ago

Hi,

I encountered the same problem. Were you able to resolve this?

Here are my environment details:

System GPU: RTX 2080Ti drivers : 440.33 cuda: 9.0 cudnn: 7.6.4

Versions of relevant libraries pytorch-nightly 1.0.0.dev20190328

Naitik1502 commented 2 years ago

Did you guys resolve it?