facebookresearch / maskrcnn-benchmark

Fast, modular reference implementation of Instance Segmentation and Object Detection algorithms in PyTorch.
MIT License
9.28k stars 2.5k forks source link

RuntimeError: cuda runtime error (35) : CUDA driver version is insufficient for CUDA runtime version at /home/username/github/maskrcnn-benchmark/maskrcnn_benchmark/csrc/cuda/nms.cu:103 #141

Open cvemeki opened 5 years ago

cvemeki commented 5 years ago

❓RuntimeError: cuda runtime error (35) : CUDA driver version is insufficient for CUDA runtime version at /home/username/github/maskrcnn-benchmark/maskrcnn_benchmark/csrc/cuda/nms.cu:103

Hello, when I run the Mask_R-CNN_demo.ipynb, in the 'cuda' mode, everything goes well until the last kernel. When running predictions = coco_demo.run_on_opencv_image(image), the error happens.

For finding the reason, I try the following things, which make me more confused: As an output of collect_env.py, I get the following informations:

Collecting environment information...
PyTorch version: 1.0.0.dev20181106
Is debug build: No
CUDA used to build PyTorch: 9.0.176

OS: Ubuntu 18.04.1 LTS
GCC version: (Ubuntu 6.4.0-17ubuntu1) 6.4.0 20180424
CMake version: version 3.10.2

Python version: 3.7
Is CUDA available: Yes
CUDA runtime version: 9.1.85
GPU models and configuration: 
GPU 0: GeForce GTX 1080 Ti
GPU 1: GeForce GTX 1080 Ti

Nvidia driver version: 390.67
cuDNN version: Probably one of the following:
/usr/lib/x86_64-linux-gnu/libcudnn.so.7.0.5
/usr/lib/x86_64-linux-gnu/libcudnn_static_v7.a

Versions of relevant libraries:
[pip3] numpy (1.15.1)
[pip3] torch (0.4.1)
[conda] Could not collect

And I also try to run the ./deviceQuery in NVIDIA_CUDA-9.1_Samples/bin/x86_64/linux/release, it gives the PASS result, more concretely:

deviceQuery, 
CUDA Driver = CUDART, 
CUDA Driver Version = 9.1, 
CUDA Runtime Version = 9.1, 
NumDevs = 2
Result = PASS

All the tests above shows that the cuda runtime version matches the cuda driver version. I hence don't know where comes from this runtime error, and how to fix it?

Thank you.

fmassa commented 5 years ago

From looking at this error in the internet, it might be a conflict between CUDA versions and drivers: https://github.com/torch/cutorch/issues/809 https://devtalk.nvidia.com/default/topic/1028320/cuda-driver-version-is-insufficient-for-cuda-runtime-version/?offset=6

I'm not sure what would be the best solution here, but maybe downgrading to CUDA 9.0 or updating to CUDA 9.2 would fix it maybe?

cvemeki commented 5 years ago

Downgrade to cuda 9.0 but still can't help

Collecting environment information...
PyTorch version: 1.0.0.dev20181106
Is debug build: No
CUDA used to build PyTorch: 9.0.176

OS: Ubuntu 18.04.1 LTS
GCC version: (Ubuntu 6.4.0-17ubuntu1) 6.4.0 20180424
CMake version: version 3.10.2

Python version: 3.7
Is CUDA available: Yes
CUDA runtime version: 9.0.176
GPU models and configuration: 
GPU 0: GeForce GTX 1080 Ti
GPU 1: GeForce GTX 1080 Ti

Nvidia driver version: 390.67
cuDNN version: Probably one of the following:
/usr/lib/x86_64-linux-gnu/libcudnn.so.7.0.5
/usr/lib/x86_64-linux-gnu/libcudnn_static_v7.a
/usr/local/cuda-9.0/lib64/libcudnn.so.7.3.1
/usr/local/cuda-9.0/lib64/libcudnn_static.a

Versions of relevant libraries:
[pip3] numpy (1.15.1)
[pip3] torch (0.4.1)
[conda] pytorch-nightly           1.0.0.dev20181106 py3.7_cuda9.0.176_cudnn7.1.2_0    pytorch
cvemeki commented 5 years ago

Here is the Trackback:

RuntimeError                              Traceback (most recent call last)
<ipython-input-51-17ba477dc913> in <module>
      1 # compute predictions
----> 2 predictions = coco_demo.run_on_opencv_image(image)
      3 imshow(predictions)
      4 get_ipython().system('nvcc -V')

~/github/maskrcnn-benchmark/demo/predictor.py in run_on_opencv_image(self, image)
    167                 the BoxList via `prediction.fields()`
    168         """
--> 169         predictions = self.compute_prediction(image)
    170         top_predictions = self.select_top_predictions(predictions)
    171 

~/github/maskrcnn-benchmark/demo/predictor.py in compute_prediction(self, original_image)
    198         # compute predictions
    199         with torch.no_grad():
--> 200             predictions = self.model(image_list)
    201         predictions = [o.to(self.cpu_device) for o in predictions]
    202 

~/anaconda2/envs/maskrcnn_benchmark/lib/python3.7/site-packages/torch/nn/modules/module.py in __call__(self, *input, **kwargs)
    475             result = self._slow_forward(*input, **kwargs)
    476         else:
--> 477             result = self.forward(*input, **kwargs)
    478         for hook in self._forward_hooks.values():
    479             hook_result = hook(self, input, result)

~/github/maskrcnn-benchmark/maskrcnn_benchmark/modeling/detector/generalized_rcnn.py in forward(self, images, targets)
     48         images = to_image_list(images)
     49         features = self.backbone(images.tensors)
---> 50         proposals, proposal_losses = self.rpn(images, features, targets)
     51         if self.roi_heads:
     52             x, result, detector_losses = self.roi_heads(features, proposals, targets)

~/anaconda2/envs/maskrcnn_benchmark/lib/python3.7/site-packages/torch/nn/modules/module.py in __call__(self, *input, **kwargs)
    475             result = self._slow_forward(*input, **kwargs)
    476         else:
--> 477             result = self.forward(*input, **kwargs)
    478         for hook in self._forward_hooks.values():
    479             hook_result = hook(self, input, result)

~/github/maskrcnn-benchmark/maskrcnn_benchmark/modeling/rpn/rpn.py in forward(self, images, features, targets)
     94             return self._forward_train(anchors, objectness, rpn_box_regression, targets)
     95         else:
---> 96             return self._forward_test(anchors, objectness, rpn_box_regression)
     97 
     98     def _forward_train(self, anchors, objectness, rpn_box_regression, targets):

~/github/maskrcnn-benchmark/maskrcnn_benchmark/modeling/rpn/rpn.py in _forward_test(self, anchors, objectness, rpn_box_regression)
    120 
    121     def _forward_test(self, anchors, objectness, rpn_box_regression):
--> 122         boxes = self.box_selector_test(anchors, objectness, rpn_box_regression)
    123         if self.cfg.MODEL.RPN_ONLY:
    124             # For end-to-end models, the RPN proposals are an intermediate state

~/anaconda2/envs/maskrcnn_benchmark/lib/python3.7/site-packages/torch/nn/modules/module.py in __call__(self, *input, **kwargs)
    475             result = self._slow_forward(*input, **kwargs)
    476         else:
--> 477             result = self.forward(*input, **kwargs)
    478         for hook in self._forward_hooks.values():
    479             hook_result = hook(self, input, result)

~/github/maskrcnn-benchmark/maskrcnn_benchmark/modeling/rpn/inference.py in forward(self, anchors, objectness, box_regression, targets)
    136         anchors = list(zip(*anchors))
    137         for a, o, b in zip(anchors, objectness, box_regression):
--> 138             sampled_boxes.append(self.forward_for_single_feature_map(a, o, b))
    139 
    140         boxlists = list(zip(*sampled_boxes))

~/github/maskrcnn-benchmark/maskrcnn_benchmark/modeling/rpn/inference.py in forward_for_single_feature_map(self, anchors, objectness, box_regression)
    116                 self.nms_thresh,
    117                 max_proposals=self.post_nms_top_n,
--> 118                 score_field="objectness",
    119             )
    120             result.append(boxlist)

~/github/maskrcnn-benchmark/maskrcnn_benchmark/structures/boxlist_ops.py in boxlist_nms(boxlist, nms_thresh, max_proposals, score_field)
     25     boxes = boxlist.bbox
     26     score = boxlist.get_field(score_field)
---> 27     keep = _box_nms(boxes, score, nms_thresh)
     28     if max_proposals > 0:
     29         keep = keep[: max_proposals]

RuntimeError: cuda runtime error (35) : CUDA driver version is insufficient for CUDA runtime version at /home/tianchuzhang/github/maskrcnn-benchmark/maskrcnn_benchmark/csrc/cuda/nms.cu:103
lanpa commented 5 years ago

Have you tried removing all torch related libs and then reinstall with conda install pytorch-nightly cuda92 -c pytorch ?

fmassa commented 5 years ago

I'l try doing what @lanpa mentioned. I don't really have any other suggestions :-/